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SECRETED PROTEINS AND NUCLEIC ACIDS ENCODING THEM 

Related Application Information 
5 This application is a continuation-in-part of 

application serial number 09/164,169, filed October 2, 
1998, which is a continuation-in-part of application 
serial number 09/164,220, filed September 30, 1998. 

Background of the Invention 

10 Many secreted proteins, for example, cytokines and 
cytokine receptors, play a vital role in the regulation 
of cell growth, cell differentiation, and a variety of 
specific cellular responses. A number of medically 
useful proteins, including erythropoietin, granulocyte- 

15 macrophage colony stimulating factor, human growth 

hormone, and various interleukins, are secreted proteins. 
Thus, an important goal in the design and development of 
new therapies is the identification and characterization 
of secreted and transmembrane proteins and the genes 

20 which encode them. 

Many secreted proteins are receptors which bind a 
ligand and transduce an intracellular signal, leading to 
a variety of cellular responses. The identification and 
characterization of such a receptor enables one to 

25 identify both the ligands which bind to the receptor and 
the intracellular molecules and signal transduction 
pathways associated with the receptor, permitting one to 
identify or design modulators of receptor activity, e.g., 
receptor agonists or antagonists and modulators of signal 

30 transduction. 



Summary of the Invention 
The present invention is based, at least in part, on 
the discovery of cDNA molecules encoding TANGO 180, TANGO 
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181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and TANGO 215, all 
of which are predicted to be, either wholly secreted or 
transmembrane proteins. These proteins, fragments, 
5 derivatives, and variants thereof are collectively 
referred to as "polypeptides of the invention" or 
"proteins of the invention." Nucleic acid molecules 
encoding polypeptides of the invention are collectively 
referred to as "nucleic acids of the invention." 

10 The nucleic acids and polypeptides of the present 

invention are useful as modulating agents in regulating a 
variety of cellular processes. Accordingly, in one 
aspect, the present invention provides isolated nucleic 
acid molecules encoding a polypeptide of the invention or 

15 a biologically active portion thereof. The present 

invention also provides nucleic acid molecules which are 
suitable as primers or hybridization probes for the 
detection of nucleic acids encoding a polypeptide of the 
invention. 

20 The invention features nucleic acid molecules which are 
at least 45% (or 55%, 65%, 75%, 85%, 95%, or 98%) 
identical to the nucleotide sequence of any of SEQ ID 

Nos:l-22, 34-43 and - or the nucleotide sequence 

of the cDNA of a clone deposited with ATCC as any of 

25 Accession Numbers 98899, 98900 and 98901 (the " cDNA of a 
clone deposited as any of ATCC 98899, 98900, and 
989001"), or a complement thereof. 

The invention features nucleic acid molecules which 
include a fragment of at least 300 (325, 350, 375, 400, 

30 425, 450, 500, 550, 600, 650, 700, 800, 900, 1000, or 
1200) nucleotides of the nucleotide sequence of any of 

SEQ ID Nos:l-22, 34-43 and - or the nucleotide 

sequence of the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, or a complement thereof. 
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The invention also- features nucleic acid molecules 
which include a nucleotide sequence encoding a protein 
having an amino acid sequence that is at least 45% (or 
55%, 65%, 75%, 85%, 95%, or 98%) identical to the amino 
5 acid sequence of any of SEQ ID Nos : 23 -33, 54-63, and - 

or the amino acid sequence encoded by the cDNA of a 
clone deposited as any of ATCC 98899, 98900, and 989001, 
or a complement thereof. 

In preferred embodiments, the nucleic acid molecules 
10 have the nucleotide sequence of any of SEQ ID NOs:l-22, 

34-43 and _ - or the nucleotide sequence of the cDNA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001. 

Also within the invention are nucleic acid molecules 

15 which encode a fragment of a polypeptide' having the amino 

acid sequence of any of SEQ ID Nos: 23 -33-, 54-63, and 

- the fragment including at least 15 (25, 30, 50, 

100, 150, 300, or 400) contiguous amino acids of any of 
SEQ ID Nos: 23-33, 54-63, and - or the polypeptide 

20 encoded by the cDNA of a clone deposited. as any of ATCC 
98899, 98900, and 989001. 

The invention includes nucleic acid molecules which 
encode a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

25 SEQ ID Nos: 23 -33, 54-63, and - or an amino acid 

sequence encoded by the cDNA of a clone deposited as any 
of ATCC 98899, 98900, and 989001, wherein. the nucleic 
acid molecule hybridizes under stringent conditions to a 
nucleic acid molecule having a nucleic acid sequence 

3 0 encoding any of SEQ ID NOs: 22-33, 54-63, and - , 

or a complement thereof. 

Also within the invention are: isolated polypeptides or 
proteins having an amino acid sequence that is at least 
about 65%, preferably 75%, 85%, 95%, or 98% identical to 
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the amino acid sequence of any of SEQ ID NOs : 22-33, 54- 

63, and - . 

Also within the invention are: isolated polypeptides or 
proteins which are encoded by a nucleic acid molecule 
5 having a nucleotide sequence that is at least about 65%, 
preferably 75%, 85%, or 95% identical the nucleic acid 

sequence encoding any of SEQ ID Nos:22-33, 54-63, and 

- _ and isolated polypeptides or proteins which are 
encoded by a nucleic acid molecule having a nucleotide 

10 sequence which hybridizes under stringent hybridization 
conditions to a nucleic acid molecule having the sequence 

of any of SEQ ID NOs: 1-22, 34-43, and - , and a 

complement thereof or the non- coding strand of the cDNA 
of a clone deposited as any of ATCC 98899, 98900, and 

15 989001. 

Also within the invention are polypeptides which are 
naturally occurring allelic variants of a polypeptide 
that includes the amino acid sequence of any of SEQ ID 
NOs: 22-33, 54-63, and - or an amino acid sequence 

20 encoded by the cDNA of a clone deposited as any of ATCC 
98899, 98900, and 989001, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes under 
stringent conditions to a nucleic acid molecule having 
the sequence of any of SEQ ID NOs:l-22, 34-43, and 

25 or a complement thereof. 

The invention also features nucleic acid molecules that 
hybridize under stringent conditions to 3. nucleic acid 
molecule comprising the nucleotide sequence of any of SEQ 
ID NOs: 1-22, 34-43, and - , of the cDNA of a clone 

30 deposited as any of ATCC 98899, 98900, and 989001, or a 
complement thereof. In other embodiments, the nucleic 
acid molecules are at least 300 (325, 350, 375, 400, 425, 
450, 500, 550, 600, 650, 700, 800, 900, 1000, or 1290) 
nucleotides in length and hybridize under stringent 

35 conditions to a nucleic acid molecule comprising the 
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nucleotide sequence of any of SEQ ID NOs:l-22, 34-43, and 

- of the cDNA of a clone deposited as any of ATCC 

98899, 98900, and 989001, or a complement thereof. In 
preferred embodiments, the isolated nucleic acid 
5 molecules encode a cytoplasmic, transmembrane, or 

extracellular domain of a polypeptide of the invention. 
In another embodiment, the invention provides an isolated 
nucleic acid molecule which is antisense to the coding 
strand of a nucleic acid of the invention. 

10 Another aspect of the invention provides vectors, e.g., 
recombinant expression vectors, comprising a nucleic acid 
molecule of the invention. In another embodiment the 
invention provides host cells containing such a vector. 
The invention also provides methods for producing a 

15 polypeptide of the invention by culturing, in a suitable 
medium, a host cell of the invention containing a 
recombinant expression vector encoding a polypeptide of 
the invention such that the polypeptide of the invention 
is produced. 

20 Another aspect of this invention features isolated or 
recombinant proteins and polypeptides of .the invention. 
Preferred proteins and polypeptides possess at least one 
biological activity possessed by the corresponding 
naturally-occurring human polypeptide. An activity, a 

25 biological activity, and a functional activity of a 
polypeptide of the invention refers to an activity 
exerted by a protein or polypeptide of the invention on a 
responsive cell as determined in vivo, or in vitro, 
according to standard techniques. Such activities can be 

30 a direct activity, such as an association with or an 
enzymatic activity on a second protein oT an indirect 
activity, such as a cellular signaling activity mediated 
by interaction of the protein with a second protein. 
Thus, such activities include, e.g., (1) the ability to 

35 form protein-protein interactions with proteins in the 
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signaling pathway of the naturally- occurring 
polypeptide; (2) the ability to bind a ligand of the 
naturally-occurring polypeptide; (3) the ability to bind 
to an intracellular target of the naturally-occurring 
5 polypeptide. Other activities include: (1) the ability 
to modulate cellular proliferation; (2) the ability to 
modulate cellular differentiation; and (3). the ability to 
modulate cell death. 

In one embodiment, a polypeptide of the invention has 

10 an amino acid sequence sufficiently identical to an 

identified domain of a polypeptide of the invention. As 
used herein, the term "sufficiently identical" refers to 
a first amino acid or nucleotide sequence which contains 
a sufficient or minimum number of identical or equivalent 

15 (e.g., with a similar side chain) amino acid residues or 
nucleotides to a second amino acid or nucleotide sequence 
such that the first and second amino acid or nucleotide 
sequences have a common structural domain and/or common 
functional activity. For example, amino acid or 

20 nucleotide sequences which contain a common structural 
domain having about 65% identity, preferably 75% 
identity, more preferably 85%, 95%, or 98% identity are 
defined herein as sufficiently identical. 

In one embodiment, the isolated polypeptide of the 

25 invention lacks both a transmembrane and a cytoplasmic 
domain. In another embodiment, the polypeptide lacks 
both a transmembrane domain and a cytoplasmic domain and 
is soluble under physiological conditions. 

The polypeptides of the present invention, or 

3 0 biologically active portions thereof, can be operably 
linked to a heterologous amino acid sequence to form 
fusion proteins. The invention further features 
antibodies that specifically bind a polypeptide of the 
invention such as monoclonal or polyclonal antibodies. 

35 In addition, the polypeptides of the invention or 
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biologically active portions thereof can be incorporated 
into pharmaceutical compositions, which optionally 
include pharmaceutical^ acceptable carriers. 

In another aspect, the present invention provides 
5 methods for detecting the presence of the activity or 
expression of a polypeptide of the invention in a 
biological sample by contacting the biological sample 
with an agent capable of detecting an indicator of 
activity such that the presence of activity is detected 

10 in the biological sample. 

In another aspect, the invention provides methods for 
modulating activity of a polypeptide of the invention 
comprising contacting a cell with an agent that modulates 
(inhibits or stimulates) the activity or expression of a 

15 polypeptide of the invention such that activity or 

expression in the cell is modulated. In qne embodiment, 
the agent is an antibody that specifically binds to a 
polypeptide of the invention. 

In another embodiment, the agent modulates expression 

20 of a polypeptide of the invention by modulating 

transcription, splicing, or translation of an mRNA 
encoding a polypeptide of the invention. In yet another 
embodiment, the agent is a nucleic acid molecule having a 
nucleotide sequence that is antisense to the coding 

25 strand of an mRNA encoding a polypeptide of the 
invention. 

The present invention also provides methods to treat a 
subject having a disorder characterized by aberrant 
activity of a polypeptide of the invention or aberrant 

30 expression of a nucleic acid of the invention by 
administering an agent which is a modulator of the 
activity of a polypeptide of the invention or a modulator 
of the expression of a nucleic acid of the invention to 
the subject. In one embodiment, the modulator is a 

35 protein of the invention. In another embodiment, the 
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modulator is a nucleic acid of the invention. In other 
embodiments, the modulator is a peptide, peptidomimetic , 
or other small molecule. 

The present invention also provides diagnostic assays 
5 for identifying the presence or absence of a genetic 

lesion or mutation characterized by at least one of: (i) 
aberrant modification or mutation of a gene encoding a 
polypeptide of the invention, (ii) mis-regulation of a 
gene encoding a polypeptide of the invention, and (iii) 

10 aberrant post- translational modification of a polypeptide 
of the invention wherein a wild-type form of the gene 
encodes a polypeptide having the activity of the 
polypeptide of the invention. 

In another aspect, the invention provides a method for 

15 identifying a compound that binds to or modulates the 
activity of a polypeptide of the invention. In general, 
such methods entail measuring a biological activity of 
the polypeptide in the presence and absence of a test 
compound and identifying those compounds which alter the 

20 activity of the polypeptide. 

The invention also features methods for identifying a 
compound which modulates the expression of a polypeptide 
or nucleic acid of the invention by measuring the 
expression of the polypeptide or nucleic acid in the 

25 presence and absence of the compound. 

Other features and advantages of the invention will be 
apparent from the following detailed description and 
claims . 

Brief Description of the Drawings 
30 Figure 1 depicts the cDNA sequence (SEQ ID NO:l) and 
predicted amino acid sequence (SEQ ID NO: 23) of human 
TANGO 180. 



WO 00/18904 



PCT/US99/22817 



- 9 - 

Figure 2 depicts the cDNA sequence (SEQ ID NO: 34) and 
predicted amino acid sequence (SEQ ID NO: 54) of murine 
TANGO 180. 

Figure 3 depicts the cDNA sequence (SEQ ID NO:2) and 
5 predicted amino acid sequence (SEQ ID NO:24) of human 
TANGO 181. 

Figure 4 depicts the partial cDNA sequence (SEQ ID 
NO: 35; partial) and predicted amino acid sequence (SEQ ID 
NO: 55; partial) of murine TANGO 181. 
10 Figure 5 depicts the cDNA sequence (SEQ ID NO: 3) and 
predicted amino acid sequence (SEQ ID NO: 25) of human 
TANGO 182. 

Figure 6 depicts the partial cDNA sequence (SEQ ID 
NO: 36; partial) and predicted amino acid sequence (SEQ ID 
15 NO: 56; partial) of murine TANGO 182. 

Figure 7 depicts the cDNA sequence (SEQ ID NO: 4) and 
predicted amino acid sequence (SEQ ID NO:26) of human 
TANGO 183. 

Figure 8 depicts the cDNA sequence (SEQ ID NO: 37) and 
20 predicted amino acid sequence (SEQ ID NO: 57) of murine 
TANGO 183. 

Figure 9 depicts the cDNA sequence (SEQ ID NO: 5) and 
predicted amino acid sequence (SEQ ID NO: 27) of human 
TANGO 184. 

25 Figure 10 depicts the cDNA sequence (SEQ ID NO: 38) and 
predicted amino acid sequence (SEQ ID NO: 58) of murine 
TANGO 184. 

Figure 11 depicts the cDNA sequence (SEQ ID NO: 6) and 
predicted amino acid sequence (SEQ ID NO: 28) of human 
30 TANGO 185. 

Figure 12 depicts the cDNA sequence (SEQ ID NO: 3 9) and 
predicted amino acid sequence (SEQ ID NO: 59) of murine 
TANGO 185. 
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Figure 13 depicts the cDNA sequence (SEQ ID NO: 7) and 
predicted amino acid sequence (SEQ ID NO: 29) of human 
TANGO 186. 

Figure 14 depicts the cDNA sequence (SEQ ID NO: 40) and 
5 predicted amino acid sequence (SEQ ID NO: 60) of murine 
TANGO 186. 

Figure 15 depicts the cDNA sequence (SEQ ID NO: 8) and 
predicted amino acid sequence (SEQ ID NO: 30) of human 
TANGO 188. 

10 Figure 16 depicts the cDNA sequence (SEQ ID NO:41) and 
predicted amino acid sequence (SEQ ID NO. r 6 1 ) of murine 
TANGO 188. 

Figure 17 depicts the cDNA sequence (SEQ ID NO: 9) and 
predicted amino acid sequence (SEQ ID NO: 31) of human 
15 TANGO 189. 

Figure 18 depicts the cDNA sequence (SEQ ID NO: 42) and 
predicted amino acid sequence (SEQ ID NO: 62) of murine 
TANGO 189. 

Figure 19 depicts the cDNA sequence (SEQ ID NO: 10) and 
20 predicted amino acid sequence (SEQ ID NO: 32) of human 
TANGO 215. 

Figure 20 depicts the cDNA sequence (SEQ ID NO: 11) and 
predicted amino sequence of human TANGO 187-1/3 (SEQ ID 
NO: 22) . 

25 Figure 21 depicts the cDNA sequence (SEQ ID NO: 43; 
partial) and predicted amino acid sequence of murine 
TANGO 187 (SEQ ID NO:63; partial). 

Figure 22 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 23) and murine (SEQ ID 
30 NO: 54) TANGO 180. 

Figure 23 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 24) and murine (SEQ ID 
NO: 55; partial) TANGO 181. 
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Figure 24 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 25) and murine (SEQ ID 
NO: 5; partial) TANGO 182. 

Figure 25 depicts an alignment of the predicted amino 
5 acid sequences of human (SEQ ID NO: 26) and murine (SEQ ID 
NO: 57) TANGO 183. 

Figure 26 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 27) and murine (SEQ ID 
NO: 58) TANGO 184. 
10 Figure 2 7 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 28) and murine (SEQ ID 
NO: 59) TANGO 185. 

Figure 2 8 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 29) and murine (SEQ ID 
15 NO: 60) TANGO 186. 

Figure 2 9 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 30) and murine (SEQ ID 
NO: 61) TANGO 188. 

Figure 30 depicts an alignment of the predicted amino 
20 acid sequences of human (SEQ ID NO: 31) and murine (SEQ ID 
NO: 62) TANGO 189. 

Figure 31 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 33) and murine (SEQ ID 
NO: 63; partial) TANGO 187. 
25 Figure 3 2 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:l) and murine (SEQ ID NO : 34) TANGO 180. 

Figure 3 3 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 2) and murine (SEQ ID NO: 35; partial) 
TANGO 181. 

30 Figure 34 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 3) and murine (SEQ ID NO: 36; partial) 
TANGO 182. 

Figure 3 5 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:4) and murine (SEQ ID NO:37) TANGO 183. 
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Figure 36 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 5) and murine (SEQ ID NO: 38) TANGO 184. 

Figure 3 7 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO:6) and murine (SEQ ID NO:39) TANGO 185. 
5 Figure 3 8 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 7) and murine (SEQ ID NO: 40) TANGO 186. 

Figure 3 9 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 8) and murine (SEQ ID NO: 41) TANGO 18 8. 

Figure 40 depicts an alignment of the cDNA sequences of 
10 human (SEQ ID NO: 9) and murine (SEQ ID NO: 42) TANGO 189 . 

Figure 41 depicts an alignment of the cDNA sequences of 
human (SEQ ID NO: 11) and murine (SEQ ID NO:43; partial) 
TANGO 187. 

Figure 42 depicts an alignment of the amino acid 
15 sequences of human TANGO 181 (SEQ ID NO: 24), murine TANGO 
181 (SEQ ID NO:55; partial), human TANGO 182 (SEQ ID 
NO:25), and murine TANGO 182 (SEQ ID NO:56; partial). 

Figure 43 depicts an alignment of the amino acid 
sequences of human TANGO 184 (SEQ ID NO: 27) and human 
20 TANGO 183 (SEQ ID NO: 26) . 

Figure 44 depicts an alignment of the amino acid 
sequences of murine TANGO 184 (SEQ ID NO: 58) and murine 
TANGO 183 (SEQ ID NO:57). 

Figure 45 depicts and alignment of the amino acid 
25 sequences of human TANGO 180 (SEQ ID NO:23), murine TANGO 
180 (SEQ ID NO:54), agkistrodon PLA2 (SQ ID NO:109), 
acanthahis PLA2 (SEQ ID NO: 110), and bovine PIA2 (SEQ ID 
NO:lll) . 

Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

30 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO,: ) of TANGO 

187-2/3 . 
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Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2/3 . 

Figure 4 9 depicts the cDNA sequence (SEQ ID NO: ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2. 

Figure 50 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

10 Figure 51 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3 . 

Figure 52 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

15 187. 

Figure 53 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence- (SEQ ID NO: ) 

of murine TANGO 181. 

Figure 54 depicts a complete cDNA sequence (SEQ ID 

20 NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 182. 

Figure 55 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence. (SEQ ID NO: ) 

of murine TANGO 187. 
25 Figure 56 depicts a complete cDNA sequence (SEQ ID 

NO: ) and predicted amino acid sequence (SEQ ID NO: ) 

of murine TANGO 215. 

Detailed Description of the Invention 
The present invention is based on the discovery of cDNA 
30 molecules encoding TANGO 180, TANGO 181,, TANGO 182, TANGO 
183, TANGO 184, TANGO 185, TANGO 186, TANGO 188, TANGO 
189, TANGO 215, and TANGO 187, all of which are predicted 
to be either wholly secreted or transmembrane proteins. 
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TANGO 180 

The human TANGO 180 cDNA of SEQ ID NO:l has a 567 
nucleotide open reading frame (SEQ ID NO: 12) encoding a 
18 9 amino acid protein (SEQ ID NO: 23) . The cDNA and 
5 protein sequences of human TANGO 180 are shown in Figure 
1. 

Human TANGO 180 is predicted to be a wholly secreted 
protein having a 22 amino acid signal sequence (amino 
acids 1 - 22 of SEQ ID NO: 23; SEQ ID NO: 64) followed by a 

10 167 amino acid mature protein (amino acids 23 - 189 of 
SEQ ID NO: 23; SEQ ID NO: 76 ) . TANGO 180 is predicted to 
have a molecular weight of 21.0 kDa prior to cleavage of 
its signal peptide and a molecular weight of 18.5 kDa 
subsequent to cleavage of its signal peptide. 

15 The murine TANGO 180 of SEQ ID NO: 34 has a 576 

nucleotide open reading frame (SEQ ID NO: 44) encoding a 
192 amino acid protein (SEQ ID NO:54). The cDNA and 
protein sequences of murine TANGO 180 are; shown in Figure 
2. 

20 Figure 22 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 23) and murine (SEQ 
ID NO:54) TANGO 180 (88.7% identity). Figure 32 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO:l) 
and murine (SEQ ID NO:34) TANGO 180 (55% identity) . 

25 Northern analysis of human TANGO 180 mRNA expression 
revealed the presence of two major transcripts (1.3 and 
5.2 5 kb) and three minor transcripts (0.95, 1.8, and 4.15 
kb) . This analysis also revealed that all five 
transcripts are expressed at a low level in placenta, 

30 lung, and liver; that the 1.3 and the 5.25 kb transcripts 
are expressed at a moderate level in brain and kidney; 
that the 5.25 kb transcript is expressed at a moderate 
level in heart, skeletal muscle, and pancreas; and that 
the 1.3 kb transcript is expressed at a high level in 

35 heart, skeletal muscle, and pancreas. 



WO 00/18904 



PCT/US99/22817 



- 15 - 

In situ expression analysis of TANGO 180 in adult 
murine tissue revealed no significant expression in 
bladder, pancreas, heart, thymus, kidney, brain, colon, 
placenta, eye, liver, spleen, lung, skeletal 
5 muscle/diaphram, or small intestine. In situ expression 
analysis of murine embryonic tissue revealed expression 
in the liver at E13.5 through E16.5. Liver expression 
was also observed, although at a lower level, at E17.5 
and PI. 5. 

10 TANGO 180 maps to human chromosome location 4q25. 
TANGO 180 is predicted to have a phospholipase A2 
histidine active site domain at amino acids 106-113 of 
SEQ ID NO: 23 and a phospholipase A2 aspartic acid active 
site-like domain at amino acids 124-131 of SEQ ID NO: 23. 

15 An apparent genomic sequence of TANGO 180 appears at 
GenBank Accession Number AC004067. 

Human TANGO 180 bears some similarity to a number of C. 
elegans proteins. 

TANGO 180 bears some similarity to a number of known 

20 phospholipase A2 (PLA2) proteins (Lambeau et al . (1994) 
J. Biol. Chem. 269:1575-78; Lambeau et al . (1995) J. 
Biol. Chem. 270:5534-40). TANGO 180 may play a role 
similar to that of a phospholipase A2 . Figure 45 
depicts and alignment of the amino acid sequences of 

25 human TANGO 180 (SEQ ID NO : 23 ) , murine TANGO 180 (SEQ ID 
NO:54), agkistrodon PLA2 (SQ ID NO:109), acanthahis PLA2 
(SEQ ID NO:110), and bovine PLA2 (SEQ ID NO:lll) . There 
are thought to be at least two important regions within 
many PLA2's: CCXXHCCX (hisitidine at active site) and 

3 0 LIVMACLIVMFYWPCSTCDXXXXXC (aspratic acid active site) . 
Various phospholipase A2 proteins are thought to be 
involved in inflammation. Moreover, it appears that the 
expression and synthesis of at least some phospholipase 
A2 proteins are induced by pro -inflammatory modulators 

35 such as interleukin-1, interleukin-6 , and tumor necrosis 



WO 00/18904 



PCT/US99/22817 



- 16 - 

factor. Thus, TANGO 180 may be involved in inflammation, 
e.g., arthritis, endotoxic shock, peritonitis, psoriasis, 
acute pancreatitis, and respiratory distress syndrome. 
Accordingly, TANGO 180 nucleic acid molecules and 
5 polypeptides as well as anti-TANGO 180 antibodies and 
modulators of TANGO 180 expression or activity may be 
useful in the treatment of such disorders. Moreover, 
PLA2's have been implicates in digestion, airway 
contraction, smooth muslce contraction, fertilization, 

10 and cell proliferation. Thus, TANGO 180 nucleic acid 
molecules and polypeptides as well as anti-TANGO 180 
antibodies and modulators of TANGO 180 expression or 
activity may be useful in the treatment of disorders of 
digestion, airway contraction, smooth muslce contraction, 

15 fertilization, and cell proliferation. 

TANGO 181 

The human TANGO 181 cDNA of SEQ ID NO: 2 has a 1017 
nucleotide open reading frame (SEQ ID NOil2) encoding a 
33 9 amino acid protein (SEQ ID NO: 23) . The cDNA and 

2 0 protein sequences of human TANGO 181 are shown in Figure 

3. 

Human TANGO 181 is predicted to be a secreted protein 
having a 22 amino acid signal sequence (amino acids 1 - 
22 of SEQ ID NO: 24; SEQ ID NO: 65) followed by a 317 amino 
25 acid mature protein (amino acids 23-33 9 of SEQ ID 

NO: 24; SEQ ID NO: 77) . TANGO 181 is predicted to have a 
molecular weight of 37.8 kDa prior to cleavage of its 
signal peptide and a molecular weight of .35.2 subsequent 
to cleavage of its signal peptide. 

3 0 The murine TANGO 181 partial cDNA of SEQ ID NO: 35 has a 

747 nucleotide open reading frame (SEQ ID NO: 45) encoding 
a 24 9 amino acid protein (SEQ ID NO: 55) . The partial 
cDNA and protein sequences of murine TANGO 181 are shown 
in Figure 4 . 
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Figure 23 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 24) and murine (SEQ 
ID NO:55; partial) TANGO 181 (72.1% identity). Figure 33 
depicts an alignment of the cDNA sequences of human (SEQ 
5 ID NO:2) and murine (SEQ ID NO:35; partial) TANGO 181 
(65.4% identity). The pair of cysteines at amino acids 
76 and 129 might be important for disulfide bond 
formation. The single cysteine at amino acid 262 might 
enable TANGO 181 to form homodimers (or heterodimers with 

10 TANGO 182) . 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 181 clone are shown in Figure 53. 

Northern analysis of human TANGO 181 mRNA expression 

15 revealed the presence of two transcripts (4.3 and 4.5 kb) 
expressed at a low level in heart, brain, placenta, lung, 
liver, skeletal muscle, kidney, and pancreas, with the 
level of expression in the pancreas being higher than in 
the other tissues. 

20 Murine in situ expression analysis revealed that TANGO 
181 is weakly expressed in adult brain (choroid plexus 
and olfactory bulb) . This analysis also revealed TANGO 
180 expression in the liver and kidney (medulla) . High 
level TANGO 180 expression was observed in testis. This 

25 analysis detected little or no expression of TANGO 181 in 
adult liver, ovary, heart, lung, spleen, fat, muscle, 
skin, stomach, duodenum, colon, pancreas, thymus, 
pituitary, and eye. In situ expression analysis of 
embryos revealed that TANGO 181 is ubiquitously expressed 

30 at stages E12.5, E13.5, and E14.5. 

TANGO 181 maps to human chromosome location 8pl2 . WI- 
5768 and AFMB057WG5 are markers which flank TANGO 181. 
Nearby loci include WRN (Werner. Syndrome) and SPG5A 
(Spastic Paraplegia 5A) , and nearby known genes include 

3 5 FGFR1 (fibroblast growth factor receptor) , STAR 
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(Steroidogenic acute regulatory protein) , ANK1 (abkyrin 
1) , CALB1 (calbindin 1) , CHRNB3 (cholinergic receptor, 
nicotinic) . The human chromosomal location corresponds 
to a position on mouse chromosome 8 near fgfri 
5 (fibroblast growth factor receptor) , cyrn (cyritesin 1) , 
tissue plasminogen activator, and ank (ankyrin 1) . 

Within the 3' untranslated region of the human TANGO 
181 cDNA described above is a 260 base pair sequence 
(Genbank Accession Number Z36802) previously identified 

10 as part of a gene that appears to be preferentially 

expressed in pancreatic cancer and chronic pancreatitis 
(Gress et al . (1996) Oncogene 13:1819-30). Thus, TANGO 
181 nucleic acids and polypeptides may be useful for the 
diagnosis and/or treatment of chronic pancreatitis and 

15 pancreatic cancer (as well as other cancers) . In 

addition, modulators of TANGO 181 expression or activity 
may be useful in the treatment of such disorders. 

TANGO 181 and TANGO 182 are highly homologous to teh C. 
elegans protein C42C1.9 

20 TANGO 182 

The human TANGO 182 cDNA of SEQ ID NO: 3 has a 1044 
nucleotide open reading frame (SEQ ID NO: 14) encoding a 
348 amino acid protein (SEQ ID NO: 25) . The cDNA and 
protein sequences of human TANGO 182 are shown in Figure 
25 5. 

Human TANGO 182 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 - 
23 of SEQ ID NO:25; SEQ ID NO:66) followed by a 325 amino 
acid mature protein (amino acids 24 - 348 of SEQ ID 
30 NO: 25; SEQ ID NO: 78) . TANGO 182 is predicted to have a 
molecular weight of 39.2 kDa prior to cleavage of its 
signal peptide and a molecular weight of 36.1 kDa 
subsequent to cleavage of its signal peptide. 
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The murine TANGO 182 partial cDNA of SEQ ID NO: 36 has 
an 825 nucleotide open reading frame (SEQ ID NO: 46) 
encoding a 275 amino acid protein (SEQ ID NO: 56) . The 
partial cDNA and protein sequences of murine TANGO 182 
5 are shown in Figure 6. Figure 24 depicts an alignment 

of the predicted amino acids sequences of human (SEQ ID 
NO:25) and murine (SEQ ID NO:56; partial) TANGO 182 
(75.1% identity). Figure 34 depicts an alignment of the 
cDNA sequences of human (SEQ ID NO: 3) and murine (SEQ ID 

10 NO:36; partial) TANGO 182 (67.6% identity). The pair of 
cysteines at amino acids 78 and 13 0 might be important 
for disulfide bond formation. The single cysteine at 
amino acid 312 might enable TANGO 182 to form homodimers 
(or heterodimers with TANGO 181) . 

15 The cDNA sequence (SEQ ID NO: ) and predicted amino 

acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 182 clone are shown in Figure 54. 

TANGO 182 maps to human chromosomal location 10q24 
between markers D10S566 and D10S540. In mice, TANGO 182 
20 maps to chromosome 10 bwtween D10S198 and D10S192 (129.8 
to 131.2 CM) . 

Northern analysis of human TANGO 182 mRNA expression 
revealed the presence of a 2.8 kb transcript that is 
expressed at a high level placenta and a somewhat lower 

25 level in liver, kidney, and pancreas. This transcript is 
expressed at a low level in heart, brain, lung, and 
skeletal muscle. 

Murine in situ expression analysis revealed that TANGO 
182 is expressed at a high level in testis in adult mice. 

30 Little or no expression was detected in adult brain, 
liver, kidney, ovary, heart, lung, spleen, fat, muscle, 
skin, stomach, duodenum, colon, pancreas, thymus, 
pituitary, or eye by in situ analysis. In situ 
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expression analysis of embryos revealed ubiquitous, low 
level expression at stages E12.5, E13.5, and E14.5. 

Both human and mouse TANGO 182 are quite similar to 
human and murine TANGO 181 at the amino acid level 
5 (Figure 42). Thus, TANGO 182, like TANGO 181, may be 
useful for the diagnosis and/or treatment of pancreatic 
cancer and chronic pancreatitis as well as other cancers. 
In addition, TANGO 182 bears some similarity to a C. 
elegans protein C42C1.9 (Genbank Accession Number 

10 AF043695) that is encoded by a gene that is present in 
the same operon as a gene encoding a mitochondrial 
carrier protein. Since genes within the same operon are 
often co- regulated and encode proteins involved in the 
same physiological state, TANGO 182 may play a role in 

15 metabolism. Thus, TANGO 182 nucleic acids and 

polypeptides as well as antibodies directed against TANGO 
182 may be useful in the diagnosis and treatment of 
metabolic disorders. In addition, modulators of TANGO 
182 expression or activity may be useful in the treatment 

20 of such disorders. 



TANGO 183 

The human TANGO 183 cDNA of SEQ ID NO: 4 has a 549 
nucleotide open reading frame (SEQ ID NO: 15) encoding a 
183 amino acid protein (SEQ ID NO: 26) . The cDNA and 
25 protein sequences of human TANGO 183 are shown in Figure 
7. 

Human TANGO 183 is predicted to be a transmembrane 
protein having a 20 amino acid signal sequence (amino 
acids 1 - 20 of SEQ ID NO: 26; SEQ ID NO: 67) followed by a 
30 163 amino acid mature protein (amino acids 21 - 183 of 
SEQ ID NO: 26; SEQ ID NO: 79 ) having a 69 amino acid 
extracellular domain (amino acids 21 - 89 of SEQ ID 
NO:26; SEQ ID NO:88), a 23 amino acid transmembrane 
domain (amino acids 90-112 of SEQ ID NO: 26; SEQ ID 
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NO: 94), and a 71 amino acid cytoplasmic domain (amino 
acids 113 - 183 of SEQ ID NO 26; SEQ ID NO: 102) . There 
are 8 conserved cysteines in the extracellular domain. 
TANGO 183 has a high porportion of charged amino acids in 
5 the predicted extracellular (18%, not including 

histidines) and cytoplasmic (32%) domains. Human TANGO 
183 is predicted to have a molecular weight of 20.6 kDa 
prior to cleavage of its signal peptide and a molecular 
weight of 18.1 kDa subsequent to cleavage of its signal 
10 peptide. 

The murine TANGO 183 cDNA of SEQ ID NO: 37 has a 549 
nucleotide open reading frame (SEQ ID NO:47) encoding a 
183 amino acid protein (SEQ ID NO: 57) . The cDNA and 
protein sequences of murine TANGO 183 are shown in Figure 
15 8. 

Figure 25 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 26) and murine (SEQ 
ID NO:57) TANGO 183 (97.3% identity). Figure 35 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 4) 
20 and murine (SEQ ID NO:37) TANGO 183 (71.7% identity). 
The conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants. 

Northern analysis of human TANGO 183 mRNA expression 
25 revealed the presence of a 1.6 kb transcript that is 

expressed at a high level in brain, kidney, pancreas, and 
heart; at a moderate level in liver and skeletal muscle, 
and at a low level in placenta and lung. 

The nucleic acid sequence of TANGO 183 is related to a 
30 sequence tagged site at chromosomal location llpl5.4, and 
TANGO may map to this site. 

The predicted cytoplasmic domain of TANGO 183 has a 
relatively high number of charged residues (32%) . This 
suggests that TANGO 183 may non-covalently, e.g., 
35 electrostatically, associate with an intracellular 
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molecule such as a cytoskeletal component. Accordingly, 
TANGO 183 may itself be involved in maintaining the 
structural integrity of cells in which it is expressed. 
If so, aberrant TANGO 183 protein or aberrantly regulated 
5 TANGO 183 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accordingly, TANGO 183 nucleic acid molecules and 
polypeptides as well as ant i- TANGO 183 antibodies and 
modulators of TANGO 183 expression or activity may be 

10 useful in the treatment of disorders associated with 

aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 183 and TANGO 184 are related and may play 
similar functional roles. Figure 43 depicts an alignment 

15 of the amino acid sequences of human TANGO 184 (SEQ ID 
NO:27) and human TANGO 183 (SEQ ID NO:26). Figure 44 
depicts an alignment of the amino acid sequences of 
murine TANGO 184 (SEQ ID NO: 58) and murine TANGO 183 (SEQ 
ID NO: 57) . 

20 TANGO 183 is related to C. elegans R12C12.6 (GenBank 
Accession NO. U23510) . 

TANGO 184 

The human TANGO 184 cDNA of SEQ ID NO: 5 has a 594 
nucleotide open reading frame (SEQ ID NO: 16) encoding a 
25 198 amino acid protein (SEQ ID NO: 27) . The cDNA and 

protein sequences of human TANGO 184 are shown in Figure 
9. 

Human TANGO 184 is predicted to be a transmembrane 
protein having a 28 amino acid signal sequence (amino 
30 acids 1 - 28 of SEQ ID NO:27; SEQ ID NO:68) followed by a 
170 amino acid mature protein (amino acids 29 - 198 of 
SEQ ID NO: 27; SEQ ID NO: 80) having a 74 amino acid 
extracellular domain (amino acids 29 - 102 of SEQ ID NO: 
27; SEQ ID NO: 89), a 23 amino acid transmembrane domain 
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(amino acids 103 - 125 of SEQ ID NO: 27; SEQ ID NO : 95) , 
and a 73 amino acid cytoplasmic domain (amino acids 126 - 

198 of SEQ ID NO 27; SEQ ID NO: 103) . TANGO 184 has a 
high porportion of charged amino acids in the predicted 

5 extracellular (31%) and cytoplasmic (29%) domains. 
Notably, the transmembrane regions include charged 
residues. Human TANGO 184 is predicted to have a 
molecular weight of 22.5 kDa prior to cleavage of its 
signal peptide and a molecular weight of 18.9 kDa 
10 subsequent to cleavage of its signal peptide. 

The murine TANGO 184 cDNA of SEQ ID NO:38 has a 357 
nucleotide open reading frame (SEQ ID NO: 48) encoding a 

199 amino acid protein (SEQ ID NO: 58) . The cDNA and 
protein sequences of murine TANGO 184 are shown in Figure 

15 10. 

Figure 26 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 27) and murine (SEQ 
ID NO:58) TANGO 184 ( 94 . 5% identity) . Figure 36 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 5) 

20 and murine (SEQ ID NO:38) TANGO 184 (63.8% identity). 

Northern analysis of human TANGO 184 mRNA expression 
revealed the presence of a 2 kb transcript that is 
expressed at a high level in heart brain, placenta, 
skeletal muscle, kidney, and pancreas; and at a low level 

25 in lung and liver. There are two alternative polyA 
sites: nucleotide 1000 and nucleotide 2000. 

In situ analysis of TANGO 184 expression in adult mice 
revel expression in the brain (moderate, ubiquitous 
expression) , spinal cord (weak expression in the region 

30 of the grey matter) submandibular gland (strong, 

ubiquitous expression) , stomach (weak expression in the 
muscle region) , Kidney (weak, ubiquitous expression in 
the cortex and medulla, stronger expression in papilla) , 
adrenal gland (weak ubiquitous expression) , thymus (weak 

35 expression in cortex) , lymph node (moderate ubiquitous 
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expression) spleen (weak expression in follicles) , 
skeletal muscle/smooth muscle (diaphragm) , testis (strong 
expression in the area surrounding the seminiferous 
tubules) , ovaries (weak expression) placenta (moderate, 
5 ubiquitous expression) . This analysis did not reveal 
significant expression in white fat, brown fat, heart, 
lung, liver, pancreas, colon, small intestine, and 
bladder. In embryonic tissue, this analysis revealed 
expression at E13.5 (weak to moderate ubiquitous 

10 expression with higher expression in the brain and 

liver), E14.5 (weak to moderate ubiquitous expression 
with higher expression in the brain and liver), E15.5 
(moderate ubiquitous expression with higer expression in 
the brain), E16.5 (weak to moderate ubiquitous expression 

15 with higher expression in the brain, spinal cord, brown 
fat, submandibular gland, lung, stomach, and intestines) , 
E18.5 (weak to moderate ubiquitous expression with higher 
expression in the brain, spinal cord, brown fat, 
submandibular gland, lung, stomach, and intestines) , and 

20 PI. 5 (weak ubiquitous expression with higer expression in 
brain, submandibular gland, olfactory epithelium, and 
stomach) . 

The predicted cytoplasmic domain of TANGO 184 has a 
relatively high number of charged residues (29%) . This 

25 suggests that TANGO 184 may non-covalently, e.g., 
electrostatically, associate with an intracellular 
molecule such as a cytoskeletal component. Accordingly, 
TANGO 184 may itself be involved in maintaining the 
structural integrity of cells in which it is expressed. 

30 If so, aberrant TANGO 184 protein or aberrantly regulated 
TANGO 184 could be involved in alterations in cellular 
morphology, e.g., alterations associated with metastasis. 
Accordingly, TANGO 184 nucleic acid molecules and 
polypeptides as well as anti -TANGO 184 antibodies and 

35 modulators of TANGO 184 expression or activity may be 
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useful in the treatment of disorders associated with 
aberrant cell development or cell differentiation, e.g., 
cancer, or cell migration, e.g., tumor metastasis. 

TANGO 185 

5 The human TANGO 185 cDNA of SEQ ID NO: 6 has a 579 
nucleotide open reading frame (SEQ ID NO: 17) encoding a 
193 amino acid protein (SEQ ID NO: 28) . The cDNA and 
protein sequences of human TANGO 185 are shown in Figure 
11. 

10 Human TANGO 185 is predicted to be a transmembrane 
protein having a 24 amino acid signal sequence (amino 
acids 1 - 24 of SEQ ID NO:28; SEQ ID NO:69) followed by a 
169 amino acid mature protein (amino acids 25 - 193 of 
SEQ ID NO:28; SEQ ID NO:81) having two extracellular 

15 domains, one having 51 amino acids (amino acids 25 - 75 
of SEQ ID NO:28; SEQ ID NO:90), and a second having 19 
amino acids (amino acids 132 - 150 of SEQ ID NO: 28; SEQ 
ID NO: 91) ; three transmembrane domains, one having 2 7 
amino acids (amino acids 76 - 102 of SEQ ID NO: 28; SEQ ID 

20 NO:96), a second having 22 amino acids (amino acids 110- 
131 of SEQ ID NO:28; SEQ ID NO:97), the third having 24 
amino acids (amino acids 151 - 174 of SEQ ID NO: 28; SEQ 
ID NO: 98); and two cytoplasmic domains, one having 7 
amino acids (amino acids 103 - 109 of SEQ ID NO: 28; SEQ 

25 ID NO: 104), and a second having 19 amino acids (amino 
acids 175 - 193 of SEQ ID NO:28; SEQ ID NO: 105) . The 
predicted 22 amino acid transmembrane domain and the 
predicted 24 amino acid domain, along with the predicted 
7 amino acid cytoplasmic domain may form one hydrophobic 

30 domain that passes through the membrane twice. TANGO 185 
is predicted to have a molecular weight of 21.4 kDa prior 
to cleavage of its signal peptide and a molecular weight 
of 18.8 kDa subsequent to cleavage of its signal peptide. 
Notably, the transmembrane regions have charged residues. 



WO 00/18904 



PCIYUS99/22817 



- 26 - 

The murine TANGO 185 cDNA of SEQ ID NO: 39 has a 579 
nucleotide open reading frame (SEQ ID NO:49) encoding a 
193 amino acid protein (SEQ ID NO: 59) . The cDNA and 
protein sequences of murine TANGO 185 are shown in Figure 
5 12. 

Figure 27 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 28) and murine (SEQ 
ID NO: 59) TANGO 185 (90.7% identity) . Figure 37 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 6) 
10 and murine (SEQ ID NO:39) TANGO 185 (71.1% identity). 

Human TANGO 185 maps to chromosome 6. 

Northern analysis of human TANGO 185 mRNA expression 
revealed the presence of 2.2 kb major transcript and a 
4.2 kb minor transcript. This analysis also revealed 

15 that the 2 . 3 kb transcript is expressed at a high level 
in heart, placenta, and pancreas; at a moderate level in 
lung, liver, and kidney; and at a very low level, if at 
all, in brain and skeletal muscle. The 4.2 kb transcript 
is expressed at a low level in placenta. 

20 In situ analysis of TANGO 185 expression in adult mice 
revealed expression in the brain (choroid plexus) , 
submamandibular gland (ubiquitous expression) , white fat 
(weak expression, possible mammary gland expression) , 
stomach (mucosal epithelium) , kidney (medulla-cortex 

25 transition and medullary rays) , colon (weak expression in 
the epithelium) , small intestine (villi) , thymus (low 
level expression) , bladder (mucosal epithelium) , and 
placenta (ubiquitous expresion in decidua region) . This 
analysis did not reveal significant expression in adult 

30 eye and harderian gland, brown fat, heart, lung, liver, 
spleen, pancreas, skeletal muscle, testes, and ovaries. 

In situ analysis of TANGO 185 embryonic expression in 
mice revealed expression at E13.5 (high level expression 
the skin and submaxillary gland and low level ubiquitous 

35 expression in the liver); E14 . 5 (high level expression in 
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the choroid plexus of the lateral and fourth ventricles, 
skin, epithelium of the oral cavity, follicles of 
vibrissa, submaxillary gland, stomach, and heart; 
expression in lung (especially the developing large 
5 airways) and liver (ubiquitous expression)). At E15.5 the 
observed expression pattern is nearly identical to that 
at E14.5 except that there is expression in the region 
outlining the intestinal tract and lung expression is 
ubiguitous with higher expression in the region outlining 

10 the large airways. 

At E16.5 high level expression is observed in skin 
choroid plexus, the lining of the oral and nasal cavity, 
esophagus, bladder, stomach, intestine, large vessels of 
the heart, large airways of the lung, and the region 

15 outlining the vertebrae. Lower ubiquitous expression is 
present in the heart, lung and thymus. A somewhat 
higher, multifocal expression is present in the thymus. 

At E18.5 the expression pattern is identical to that 
observed at E16.5 except that expression is also observed 

20 in developing hair follicles. 

At PI . 5 the expression pattern is identical to that 
observed at E16.5 except that there is no long 
significant expression in the region outlining the 
vertebrae . 

25 The expression pattern of TANGO 185 during eubryonic 
development suggests that TANGO 185 expression is 
strongly associated with squamous and mucosal epithelial 
cells . 

The expression pattern of TANGO 185 suggests that it is 
30 involved in cell development and/or cell differentiation. 
Accordingly, TANGO 185 nucleic acid molecules and 
polypeptides as well as anti -TANGO 185 antibodies and 
modulators of TANGO 185 expression or activity may be 
useful in the treatment of disorders associated with 
35 aberrant cell development or cell differentiation, e.g., 
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cancer. There is evidence that TANGO 185 is expressed in 
prostate cells. Thus, TANGO 185 nucleic acid molecules 
and polypeptides as well as anti-TANGO 185 antibodies and 
modulators of TANGO 185 expression or activity may be 
5 useful in the treatment of prostate cancer. 

TANGO 186 

The human TANGO 186 cDNA of SEQ ID NO: 7 has a 114 9 
nucleotide open reading frame (SEQ ID NO: 18) encoding a 
383 amino acid protein (SEQ ID NO: 29) . The cDNA and 
10 protein sequences of human TANGO 186 are shown in Figure 
13 . 

Human TANGO 186 is predicted to be a secreted protein 
having a 20 amino acid signal sequence (amino acids 1 - 
20 of SEQ ID NO: 29; SEQ ID NO: 70) followed by a 363 amino 

15 acid mature protein (amino acids 21 - 3 83 of SEQ ID 
NO:29; SEQ ID NO:82). There are eight cysteines in 
mature TANGO 186. Some or all of these might be involved 
in disulfide bond formation. Human TANGO 186 is 
predicted to have a molecular weight of 43.0 kDa prior to 

2 0 cleavage of its signal peptide and a molecular weight of 
40.3 kDa subsequent to cleavage of its signal peptide. 

The murine TANGO 186 cDNA of SEQ ID NO: 40 has a 1146 
nucleotide open reading frame (SEQ ID NO: 50) encoding a 
382 amino acid protein (SEQ ID NO: 60) . The cDNA and 

25 protein sequences of murine TANGO 186 are shown in Figure 
14. Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Figure 2 8 depicts an alignment of the predicted amino 
30 acids sequences of human (SEQ ID NO: 29) and murine (SEQ 
ID NO:60) TANGO 186 (90.9% identity). Figure 38 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 7) 
and murine (SEQ ID NO:40) TANGO 186 (41.6% identity). 
The human and murine TANGO 186 proteins are highly 
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similar except within three portions: the signal 
sequence, a hinge region at amino acids 108-123, and a 
hinge region at amino acids 198-216. Within these three 
portions the proteins are only about 50% identical. 
5 Outside of these three portions the proteins are about 
97.3% identical. 

TANGO 186 maps to human chromosome llql4. 
Northern analysis of human TANGO 186 mRNA expression 
revealed the presence of a 1 . 8 kb transcript and a 4 kb 

10 transcript. Both transcripts are expressed at a low 

level in heart, lung, liver, skeletal muscle, kidney, and 
pancreas and at a very low level in brain. 

In situ analysis of TANGO 186 in adult mice revealed 
that TANGO 186 is expressed in brain (olfactory bulb) , 

15 spleen (low level ubiquitous signal) , small intestine 
(very strong signal in villi and submucosa) , colon 
(ubiquitous signal) , kidney (cortical and medullary 
region) , lung (bronchial epithelium) , eye (iris and 
cornea) , placenta (strong signal in the outer membrane) . 

20 This analysis did not detect expression in adult 

pancreas, heart, skeletal muscle, diaphragm, esophagus, 
liver, and thymus. 

In situ expression analysis of murine embryonic 
sagittal sections revealed expression at stage E13.5 in 

25 epithelium of the lower and upper lip, cartilage 

primordium of basisphenoid bone, cartilage condensation 
of sacral vertebral body (centrum) , small intestine, and 
heart. At stage E14.5, in addition to the expression 
observed at stage E13.5, expression was also observed in: 

30 eye (or cartilage around eye), Meckel's cartilage, and 
cartilage of the limb digits. At stage E15.5 expression 
was observed in vibrissae of the snout, kidney (embryonic 
glomeruli) , cartilage of the limb digits, cartilage of 
the vertebral column, heart, eye, and small intestine. 

35 At stage E16.5 the observed expression pattern was 
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similar to that observed at E15.5, but there was a 
notable reduction in signal from cartilage, epithelium of 
upper and lower lip, and heart. Also at stage E16.5 low 
level signal was observed in the lung, and a strong 
5 signal was still observed in the small intestine. At 
stage E17.5 expression of TANGO 186 was observed to be 
more ubiquitous. However, expression in cartilage was 
observed to decrease with the exception of ossification 
within cartilage primordium of body of mandible. At 

10 stage E17.5 strong expression continued to be observed in 
the small intestine. The expression pattern at stage 
PI. 5 was observed to be very similar to that observed at 
stage E17.5 with expression being nearly ubiquitous with 
the notable exceptions of the brain and spinal cord in 

15 which little or no expression was observed. At stage 
PI. 5 the highest expression observed was in the in the 
small intestine, lung, and kidney. 

Overall, the in situ expression analysis of adult and 
embryonic tissue revealed that expression is first 

20 observed in the developing cartilage, small intestine, 
and heart with the cartilage expression being most 
striking in the developing vertebral column and jaw area. 
Strong expression in the cartilage of the vertebral 
column and developing digits was observed through stage 

25 E16.5. Subsequently, cartilage expression was observed 
to decrease with some exceptions in the jaw area. Other 
embryonic tissue in which the observed expression was 
notable include the kidney, specifically the embryonic 
glomeruli, and the lung. These tissues continue to have 

3 0 strong expression in the adult with expression in the 
kidney also being observed in the medullary region and 
lung expression becoming restricted to the bronchial 
epithelium. Expression of TANGO 186 becomes more 
ubiquitous through PI. 5 with the most noticeable 
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exception being the brain and spinal cord. In the adult, 
however, signal is observed in the olfactory bulb. 

In a murine LPS disease model, increasaed TANGO 186 
expression was observed in the brain 2 and 8 hours after 
5 LPS treatment. Decrease TANGO 186 expression was 

observed at these same time points in the kidney. TANGO 
186 expression was also observed in the gastric mucosa. 

As discussed above, murine in situ expression analysis 
demonstrates that TANGO 186 is expressed in cartilage 

10 throughout the embryo, suggesting that TANGO 186 is a 

regulatory molecule that plays a role in a bone formation 
(e.g., condensation of cartilage). Accordingly, TANGO 
186 nucleic acid molecules and polypeptides as well as 
anti-TANGO 186 antibodies and modulators of TANGO 186 

15 expression or activity may be useful in the diagnosis and 
treatment of bone and cartilage disorders (e.g., 
osteogenesis imperfecta and broken bones, cartilage 
degradation, and bone degradation) . Moreover, many bone 
morphogenic proteins and TGF-/8 family members are 

20 regulated by extracellular proteins, e.g., noggin and 
chordin. Thus, TANGO 186, which is expressed in the 
heart, may play a role in heart development, and TANGO 
186 nucleic acid molecules and polypeptides as well as 
anti-TANGO 186 antibodies and modulators of TANGO 186 

25 expression or activity may be useful in the diagnosis and 
treatment of developmental disorders of the heart, e.g., 
valve malformation. 

There is some seqeunce similarity between TANGO 186 and 
a Bacillus serine protease. Thus, TANGO 186 may have 

30 serine protease activity. 

TANGO 188 

The human TANGO 188 cDNA of SEQ ID NO: 8 has a 792 
nucleotide open reading frame (SEQ ID NO: 19) encoding a 
264 amino acid protein (SEQ ID NO: 30) . The cDNA and 
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protein sequences of human TANGO 188 are shown in Figure 
15. 

Human TANGO 188 is predicted to be a secreted protein 
having a 23 amino acid signal sequence (amino acids 1 - 
5 23 of SEQ ID NO:30; SEQ ID NO:71) followed by a 241 amino 
acid mature protein (amino acids 24-2 64 of SEQ ID 
NO:30; SEQ ID NO:83). Human TANGO 188 is predicted to 
have a molecular weight of 29.5 kDa, prior to cleavage of 
its signal peptide. 
10 The murine TANGO 188 cDNA of SEQ ID NO:41 has an. 807 
nucleotide open reading frame (SEQ ID NO: 51) encoding a 
269 amino acid protein (SEQ ID NO: 61) . The cDNA and 
protein sequences of murine TANGO 188 are shown in Figure 
16. 

15 Figure 2 9 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 30) and murine (SEQ 
ID NO:61) TANGO 188 (80.5% identity). Figure 39 depicts 
an alignment of the cDNA sequences of human (SEQ ID NO: 8) 
and murine (SEQ ID NO:41) TANGO 188 (71.8% identity). 

20 TANGO 188 maps to human chromosome 16pl3.3. 

Northern analysis of human TANGO 188 mRNA expression 
revealed the presence of 2.0 kB transcript that is 
expressed at a low level in heart and pancreas and at a 
very low level, if at all, in brain, placenta, lung, 

25 liver, skeletal muscle, and kidney. 

In situ analysis of TANGO 188 expression in adult mice 
did not detect significant expression in in the bladder, 
placenta, pancreas, eye, heart, liver, thymus, spleen, 
kidney, lung, brain, skeletal muscle/diaphragm, colon, or 

30 small intestine. In situ analysis of TANGO 188 

expression in embryos revealed no significant expression 
at 13.5, E14.5, E15.5, E16.5, E17.5, or PI. 5. However, 
in the case of both adult mice and embryos, expression of 
TANGO 188 may have been obscured by a high background 

35 signal. 
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TANGO 188 is transcribed in an anti-sense relationship 
to NY-CO-7 (Scanlon et al . (1998) Int. J". Cancer 76:652- 
58) . Accordingly, TANGO 188 may have utility as a marker 
for colon cancer, and TANGO 188 nucleic acid molecules 
5 and polypeptides as well as anti -TANGO 188 antibodies and 
modulators of TANGO 188 expression or activity may be 
useful in the diagnosis and treatment of colon cancer or 
other types of cancer. 

The gene encoding the C. elegans homologue of NY-CO-7 

10 is present in the same operon as a gene encoding a 
mitochondrial import protein. Since genes within the 
same operon are often co- regulated and encode proteins 
involved in the same physiological state, TANGO 188 may 
be a mitochondrial import protein or may be involved in 

15 some other mitochondrial function. Thus, TANGO 188 
nucleic acids and polypeptides as well as antibodies 
directed against TANGO 188 and modulators of TANGO 188 
expression or activity may be useful in the diagnosis and 
treatment of disorders associated with defects in 

20 mitochondrial function. 

TANGO 188 appears to be the homologue of a C. elegans 
protein that is present in the same operon as a gene 
encoding a protein that bears some similarity to SnF8p, a 
yeast zinc finger protein that is likely a transcription 

25 factor involved in expression of genes encoding certain 
proteins involved in respiration and metabolism. Since 
genes within the same operon are often co-regulated and 
encode proteins involved in the same physiological state, 
TANGO 188 may play a role in respiration or metabolism. 

30 Thus, TANGO 188 nucleic acids and polypeptides as well as 
antibodies directed against TANGO 188 and modulators of 
TANGO 188 expression or activity may be useful in the 
diagnosis and treatment of disorders associated with 
defects in cell respiration or metabolism. 
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TANGO 189 

The human TANGO 189 cDNA of SEQ ID NO: 9 has a 759 
nucleotide open reading frame (SEQ ID NO: 20) encoding a 
253 amino acid protein (SEQ ID NO: 31) . The cDNA and 
5 protein sequences of human TANGO 189 are shown in Figure 
17. 

The human TANGO 189 cDNA described above (SEQ ID NO: 9; 
Figure 17) represents one splice variant of TANGO 189 
(splice variant 1A) . There exists a second splice 

10 variant of human TANGO 189 (splice variant IB) . The cDNA 
sequence of this splice variant is the same the cDNA 
sequence of human TANGO 189 described above, except that 
nucleotides 674-1087 are missing. This splice variant 
cDNA encodes a 184 amino acid protein having a predicted 

15 molecular weight of 21.1 kDa prior to cleavage of the 
predicted signal sequence. Both splice variant 1A and 
splice variant IB appear to arise from a 2.1 kB 
transcript which is 2055 nucleotides long, not including 
the polyA sequence. This transcript encodes a 253 amino 

20 acid protein having a predicted molecular weight of 28.6 
kDa, not including the predicted signal sequence. 

The 2.1 kb TANGO 189 transcript encodes a human TANGO 
189 protein that is predicted to be a transmembrane 
protein having a 24 or 25 amino acid signal sequence 

25 (amino acids 1- 24 or 1-25 of SEQ ID NO: 31; SEQ ID NO: 72 
and SEQ ID NO: 73) followed by a 227 or 226 amino acid 
mature protein (amino acids 25 - 251 or 26 - 251 of SEQ 
ID NO: 31; SEQ ID NO: 84 and SEQ ID NO: 85) having a first 
extracellular domain of 114 or 115 amino acids (amino 

30 acids 25 - 138 or 26 - 138 of SEQ ID NO: 31; SEQ ID NO: 92 
and SEQ ID NO: 93), followed by a first transmembrane 
domain (amino acids 139 - 164 of SEQ ID NO: 31; SEQ ID 
NO:99), a first cytoplasmic domain (amino acids 165 - 177 
of SEQ ID NO:31; SEQ ID NO:106), a second transmembrane 

35 domain (amino acids 178 - 195 of SEQ ID NO: 31; SEQ ID 
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NO: 100), a second extracellular domain (amino acids 196 - 
211 of SEQ ID NO:31; SEQ ID NO:108), a third 
transmembrane domain (amino acids 212 - 237 of SEQ ID 
NO:31; SEQ ID NO:101), and a second cytoplasmic domain 
5 (amino acids 238 - 253 of SEQ ID.NO:31; SEQ ID NO: 107). 
The protein encoded by this 2.1 kb TANGO 189 transcript 
is predicted to have a molecular weight of 21.8 kDa prior 
to cleavage of its signal peptide and a molecular weight 
of 25.2 kDa subsequent to cleavage of its signal peptide. 

10 The predicted domain structure of the protein encoded 
splice variant 1A is identical to that of the protein 
encoded by the 2 . 1 kb transcript up to amino acid 181. 
The predicted domain structure of the protein encoded 
splice variant IB is identical to that of the protein 

15 encoded by the 2.1 kb transcript up to amino acid 180. 
The murine TANGO 189 cDNA of SEQ ID NO: 42 has a 759 
nucleotide open reading frame (SEQ ID NO: 52) encoding a 
253 amino acid protein (SEQ ID NO: 62) . The cDNA and 
protein sequences of murine TANGO 189 are shown in Figure 

20 18. 

Figure 30 depicts an alignment of the predicted amino 
acids sequences of human (SEQ ID NO: 31; splice variant 
1A) and murine (SEQ ID NO:62) TANGO 189 (91.7% idenity) . 
Figure 4 0 depicts an alignment of the cDNA sequences of 

25 human (SEQ ID NO: 9; splice variant 1A) and murine (SEQ ID 
NO:42) TANGO 189 (51.8% identity). 

Northern analysis of human TANGO 18 9 mRNA expression 
revealed the presence of one major transcript (2.1 kb) 
and four minor transcripts (3.4. kb, 4.2 kb, 6 kb, and 7 

30 kb) . The 2.1 kB transcript is expressed at a high level 
in brain, spinal cord, and testis; expressed at a low 
level in heart, placenta, skeletal muscle, kidney, 
pancreas, lung, thyroid, lymph node, trachea, adrenal, 
bone marrow, spleen, ovary, and prostate; and expressed 

35 at a very low level in liver, stomach, thymus, small 
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intestine, colon, peripheral blood lymphocytes. The 
3.4. kb, 4.2 kb, 6 kb, and 7 kb transcripts are expressed 
at a moderate level in brain and spinal cord; and are not 
expressed in testis. The 4.6 and 7 kb transcripts are 
5 expressed at a moderate level in peripheral blood 
lymphocytes . 

Murine in situ expression analysis revealed that TANGO 
189 is expressed strongly and almost ubiquitously 
expressed in the mouse embryo. Tissues with the highest 

10 expreession during embryogenesis are the brain, spinal 

chord, and small intestine. Expression decreases in most 
if not all tissues by postnatal day 1.5 but tissues of 
highest expression remain the brain, spinal chord, and 
small intestine. This pattern continues into the adult 

15 mouse with expression in most tissues decreasing even 
more, some to background levels. Of the adult tissue 
tested, the brain, spleen, small intestine, and retina, 
have the highest signal. High level expression is 
observed in the folowing adult tissues: placenta 

20 (ubiquitous) , small intestine (except villi) , eye 
(retina) , brain (ubiquitous) . Lower expression is 
observed in: bladder (stronger signal in the transitional 
epithelium) , kidney, thymus, liver, placenta, spleen, and 
colon. Expression was not observed in: heart, skeletal 

25 muscle, diaphragm, lung, and pancreas. Embryonic 

expresion was observed at stages E13.5 through E17.5 
(high ubiquitous signal, brain, spinal chord, small 
intestine have the strongest signal) and' PI. 5 (ubiquitous 
signal decreased in intensity, brain, spinal chord, small 

30 intestine, and kidney have the strongest signal) . 

TANGO 189 is useful as a tissue-specific marker. The 
expression of TANGO 189 may be altered in a variety of 
disease states (e.g., cancer). Thus, TANGO 189 nucleic 
acid molecules and polypeptides as well as ant i -TANGO 189 
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antibodies and modulators of TANGO 189 disorders cell 
proliferation and differentiation. 



TANGO 215 

The human TANGO 215 cDNA of SEQ ID NO: 10 has a 2160 
5 nucleotide open reading frame (SEQ ID NO: 21) encoding a 
720 amino acid protein (SEQ ID NO: 32) . The cDNA and 
protein sequences of human TANGO 215 are shown in Figure 
19. 

The cDNA sequence (SEQ ID NO: ) and predicted amino 

10 acid sequence (SEQ ID NO: ) of a full-length murine 

TANGO 181 clone are shown in Figure 56. 

Human TANGO 215 is predicted to be a wholly secreted 
protein having a 21 amino acid signal sequence (amino 
acids 1 - 21 of SEQ ID NO:32; SEQ ID NO:74) followed by a 
15 699 amino acid mature protein (amino acids 22 - 720 of 
SEQ ID NO:32; SEQ ID NO:86). TANGO 215 is predicted to 
have a molecular weight of 80.3 kDa prior to cleavage of 
its signal peptide and a molecular weight of 77.6 kDa 
subsequent to cleavage of its signal peptide. 
20 TANGO 215 is related to Clr/Cls (Clq) and MASP1/MASP2 
(mannose -binding lectin-associated serine protease) 
proteases, all of which are involved in the alternative 
pathway pathway of immune response. 

TANGO 215 may be a theronine protease. There is a 
25 threonine in the sequence TGG at amino acid 664: -666 of 
human and murine TANGO 215. This sequence is within a 
region having similarity to the active site of certain 
proteases. Human TANGO 215 is predicted to have CUB 

domain (amino acids 128 - 236 of SEQ ID NO:32) , an EGF 
30 domain (amino acids 239 - 271 of SEQ ID NO:32), a small 
consensus repeat (SCR) domain (amino acids 280 - 342 of 
SEQ ID NO:32), a partial SCR domain (amino acids 408 - 
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442 of SEQ ID NO: 32), and a serine protease domain (amino 
acids 461 - 720 of SEQ ID NO:32). 

Northern analysis of human TANGO 215 mRNA expression 
revealed the presence of a 2.7 kb transcript in heart, 
5 brain, and placenta. 

In situ analysis of TANGO 215 expression in adult mice 
revealed expression in the brain (cortex and caudate 
putamen) , kidney (cortex, most likely within the 
glomeruli) , bladder (ubiquitous expression) , liver 

10 (possibly within vessels) , and placenta (outer membrane 
region) . This analysis did not detect expression in the 
lung, small intestine, pancreas, thymus, eye, heart, or 
muscle/diaphragm. 

In situ analysis of TANGO 215 in embryos revealed 

15 expression at E13.5 in developing limbs and vertebrae. 
At E14.5 the observed expression pattern was similar to 
that at E13.5 except that expression was observed in the 
muscle surrounding abdomen, the skin, and the jaw. At 
E15.5 expression was observed in the developing kidney 

20 and bladder and outer layer of the tongue. At later 
ages, E16.5 through PI. 5, expression is observed in the 
smooth muscle layer of the small intestine, the portal 
regions of the liver, and the large airways of the lungs. 
Expression in the brain is absent until E18.5 when 

25 expression is apparent in the caudate putamen. 

Expression remains strong at PI . 5 in the vertebrae, tail, 
and sternum and possibly the muscle between developing 
bones . 

The region of human TANGO 215 from amino acid 280 to 
30 the end is predicted to be the human homologue of Limilus 
Factor C (27% identity) . Thus, this region of TANGO 215 
is predicted to include an effector domain (serine 
protease domain) and, perhaps, an LPS sensing domain. 
Thus, TANGO 215 may sense and respond to LPS with the 
35 response to the presence of LPS being activation of 
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serine protease activity. Accordingly, TANGO 215 nucleic 
acids and polypeptides as well as antibodies directed 
against TANGO 215 and modulators of TANGO 215 expression 
or activity may be useful in the diagnosis and treatment 
5 sepsis. 

CUB domains are extracellular domains of about 110 
amino acids. CUB domains are found in functionally 
diverse, mostly development ally regulated proteins. Most 
contain four cysteines that are involved in two disulfide 

10 bonds (C1-C2 and C3-C4) . SCR domains are also known as 
complement control protein (CCP) modules. EGF domains 
are commonly involved in receptor-ligand interactions. 
CUB, EGF, and SCR domains are commonly involved in 
protein-protein interaction. Because these domains are 

15 present in TANGO 215, it is predicted to interact with 
one or more other proteins. The presence of these 
domains in TANGO 215 suggests that TANGO 215 is involved 
in development, perhaps bone and cartilage morphogenesis. 
TANGO 215 nucleic acid molecules and polypeptides as well 

20 as anti-TANGO 215 antibodies and modulators of TANGO 215 
expression or activity may be useful in the treatment of 
developmental disorders. 
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TANGO 187 

The human TANGO 187-1/3 cDNA of SEQ ID NO: 11 has a 1032 
nucleotide open reading frame (SEQ ID NO: 22) encoding a 
343 amino acid protein (SEQ ID NO: 33) . The cDNA and 
5 protein sequences of human TANGO 187-1/3 are shown in 
Figure 20. 

Human TANGO 187-1/3 is predicted to be a wholly 
secreted protein having a 20 amino acid signal sequence 
(amino acids 1 - 20 of SEQ ID NO: 33; SEQ ID NO: 75) 

10 followed by a 323 amino acid mature protein (amino acids 
21 - 343 of SEQ ID NO: 33; SEQ ID NO: 87) . Human TANGO 
187-1/3 is predicted to have a molecular weight of 37.5 
kDa prior to cleavage of its signal peptide and a 
molecular weight of 35.9 kDa subsequent to cleavage of 

15 its signal peptide. 

The TANGO 187-1/3 cDNA described upon actually 
represents one of 8 different TANGO 187 splice variants. 
Each variant contains none, one, two or three of three 
variant regions . These regions are referred to as region 

20 1, region 2, and region 3, and each of the various forms 
of TANGO 187 is referred to by including a reference to 
the variant regions present. Thus, the form of TANGO 187 
described above is TANGO 187-1/3 because it includes 
regions 1 and 3 . 

25 Figure 46 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1. 

Figure 47 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

30 187-2/3. 

Figure 48 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO:_) of TANGO 
187-1/2/3. 
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Figure 49 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-1/2. 

Figure 50 depicts the cDNA sequence (SEQ ID NO: ) and 

5 predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-2. 

Figure 51 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187-3. 

10 Figure 52 depicts the cDNA sequence (SEQ ID NO: ) and 

predicted amino acid sequence (SEQ ID NO: ) of TANGO 

187. This form does not include any of the three variant 
regions . 

The murine TANGO 187 cDNA of SEQ ID NO: 43 is only a 

15 partial sequence. This cDNA has an open reading frame 
extending from nucleotide 73 to the end of the available 
sequence (SEQ ID NO: 53) encoding a 152 amino acid protein 
(SEQ ID NO: 63) . The partial cDNA and protein sequences 
of murine TANGO 187 are shown in Figure 21. 

20 Figure 31 depicts an alignment of the predicted amino 
acid sequences of human (SEQ ID NO: 33) and murine (SEQ ID 
NO:63; partial) TANGO 187 (50.4% identity). Figure 41 
depicts an alignment of the cDNA sequences of human (SEQ 
ID NO: 11) and murine (SEQ ID NO: 43; partial) TANGO 187 

25 (66.0% identity) . 

Northern analysis of human TANGO 187 mRNA expression 
revealed the presence of 1.3 and 2.4 kb transcripts that 
are approximately equally expressed at a low level in 
heart, brain, lung, liver, and smooth muscle and at a 

30 moderate level in kidney and placenta. 

In situ analysis of TANGO 187 expression in adult mice 
revealed that TANGO 187 is expressed in brain (weak, 
ubiquitous signal) , eye and harderian gland (weak signal 
in the retina) , submandibular gland (weak, ubiquitous 

35 signal) , stomach (weak, ubiquitous signal) , kidney (weak, 
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ubiquitous signal), adrenal gland (low level, ubiquitous 
expression), colon (low level, ubiquitous expression), 
small intestine (low level, ubiquitous expression), 
thymus (moderate level, ubiquitous expression in. the 
5 cortical region with lower expression in the medulla) , 
lymph node (ubiquitous expression) , spleen (low level 
ubiquitous expression with lower expression in the 
follicles, bladder (moderate expression in the mucosal 
epithelium) , testes (moderate, ubiquitous expression 

10 signal that defines the seminiferous vesicles) . In this 
analysis, TANGO 187 expression was not detectable in the 
spinal cord, brown fat, heart, lung, liver, pancreas, 
skeletal muscle, and ovaries. 

In situ analysis of TANGO 187 expression in embryos at 

15 E13.5 revealed ubiquitous expression with the strongest 
expression in the brain and spinal cord. A punctate 
expression pattern was observed in the lungs suggestive 
of higher expression in the developing large airways. At 
E14 . 5 the expression pattern was similar to that observed 

20 at E13.5 except that expression was observed in the 
developing olfactory system and the eye at a level 
similar to that observed in the brain and spinal cord. 
Expression is also present at E14.5 in the epithelium of 
the tongue, the dermis of the snout, the kidneys and the 

25 stomach. At E15.5 low level ubiquitous expression was 
observed with the highest expression in the brain, spinal 
cord, eye, and olfactory system. Slightly lower 
expression was observed in the lung (ubiquitous 
expression) and kidney (cortical region) than in the 

30 aforementioned neuronal tissues. At E16.5 the observed 
expression pattern is identical to that seen at E15.5 
except TANGO 187 expression is observed in the thymus and 
the mucosal portion of the stomach. At E18.5 TANGO 187 
continues to be highest in neuronal tissue with lower 

35 expression in the hind brain and spinal cord than in the 
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forebrain with the neopallial cortex having the highest 
signal. At E16.5 expression is observed in the thymus 
and small intestine. At PI. 5 the observed expression 
pattern is nearly identical to that at E18.5 except that 
5 expression in the the lung and stomach has decreased. At 
PI. 5 expression is highest in the brain, eye, olfactory 
epithelium and kidney. 

Tango 187 contain a region moderately similar to an 
armadillo/beta-catenin repeat. Such repeats are thought 
10 to be involved in protein-protein interactions. 
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TABLE 1: Summary of Human TANGO 180, TANGO 181, TANGO 

182, TANGO 183, TANGO 184, TANGO 185, TANGO 
186, TANGO 187, TANGO 188, TANGO 189, and 
TANGO 215 Sequence Information. 
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TABLE 2: Summary of Domains of Human TANGO 180, TANGO 
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TABLE 3: Summary of Murine TANGO 180, TANGO 180, TANGO 

181, TANGO 182, TANGO 183, TANGO 184, TANGO 
185, TANGO 186, TANGO 188, TANGO 189, and 
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Various aspects of the invention are described in 
10 further detail in the following subsections 

I . Isolated Nucleic Acid Molecules 

One aspect of the invention pertains to isolated 
nucleic acid molecules that encode a polypeptide of the 
invention or a biologically active portion thereof, as 

15 well as nucleic acid molecules sufficient for use as 

hybridization probes to identify nucleic acid molecules 
encoding a polypeptide of the invention and fragments of 
such nucleic acid molecules suitable for use as PCR 
primers for the amplification or mutation of nucleic acid 

20 molecules. As used herein, the term "nucleic acid 
molecule" is intended to include DNA molecules (e.g., 
cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and 
analogs of the DNA or RNA generated using nucleotide 
analogs. The nucleic acid molecule can be single- 

25 stranded or double -stranded, but preferably is double- 
stranded DNA. 

An "isolated" nucleic acid molecule is one which is 
separated from other nucleic acid molecules which are 
present in the natural source of the nucleic acid 
30 molecule. Preferably, an "isolated" nucleic acid 
molecule is free of sequences (preferably protein 
encoding sequences) which naturally flank the nucleic 
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acid (i.e., sequences located at the 5' and 3' ends of 
the nucleic acid) in the genomic DNA of the organism from 
which the nucleic acid is derived. For example, in 
various embodiments, the isolated nucleic acid molecule 
5 can contain less than about 5 kB, 4 kB, 3 kB, 2 kB, 1 kB, 
0.5 kB or 0.1 kB of nucleotide sequences which naturally 
flank the nucleic acid molecule in genomic DNA of the 
cell from which the nucleic acid is derived. Moreover, 
an "isolated" nucleic acid molecule, such as a cDNA 

10 molecule, can be substantially free of other cellular 
material, or culture medium when produced by recombinant 
techniques, or substantially free of chemical precursors 
or other chemicals when chemically synthesized. 

A nucleic acid molecule of the present invention, e.g., 

15 a nucleic acid molecule having the nucleotide sequence of 

any of SEQ ID Nos:l-22, 34-43, and - or the cDNA 

of a clone deposited as any of ATCC 98899, 98900, and 
989001, or a complement thereof, can be isolated using 
standard molecular biology techniques and the sequence 

2 0 information provided herein. Using all or a portion of 
the nucleic acid sequences of any of SEQ ID NOs:l-22, 34- 

43, and _ - or the cDNA of a clone deposited as any 

of ATCC 98899, 98900, and 989001 as a hybridization 
probe, nucleic acid molecules of the invention can be 

25 isolated using standard hybridization and cloning 

techniques (e.g., as described in Sambrook et al . , eds., 
Molecular Cloning: A Laboratory Manual, 2nd ed., Cold 
Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989) . 

30 A nucleic acid molecule of the invention can be 

amplified using cDNA, mRNA or genomic DNA as a template 
and appropriate oligonucleotide primers according to 
standard PCR amplification techniques. The nucleic acid 
so amplified can be cloned into an appropriate vector and 

35 characterized by DNA sequence analysis. Furthermore, 



L 
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oligonucleotides corresponding to all or a portion of a 
nucleic acid molecule of the invention can be prepared by 
standard synthetic techniques, e.g., using an automated 
DNA synthesizer. 
5 In another preferred embodiment, an isolated nucleic 
acid molecule of the invention comprises a nucleic acid 
molecule which is a complement of the nucleotide sequence 

shown in SEQ ID NOs:l-22, 34-43, and - or the 

cDNA of a clone deposited as ATCC 98899, 98900, and 

10 989001, or a portion thereof. A nucleic acid molecule 
which is complementary to a given nucleotide sequence is 
one which is sufficiently complementary to the given 
nucleotide sequence that it can hybridize to the given 
nucleotide sequence thereby forming a stable duplex. 

15 Moreover, a nucleic acid molecule of the invention can 
comprise only a portion of a nucleic acid sequence 
encoding a full length polypeptide of the invention for 
example, a fragment which can be used as a probe or 
primer or a fragment encoding a biologically active 

20 portion of a polypeptide of the invention. The nucleotide 
sequence determined from the cloning one gene allows for 
the generation of probes and primers designed for use in 
identifying and/or cloning homologues in other cell 
types, e.g., from other tissues, as well as homologues 

25 from other mammals. The probe/primer typically comprises 
substantially purified oligonucleotide. The 
oligonucleotide typically comprises a region of 
nucleotide sequence that hybridizes under stringent 
conditions to at least about 12, preferably about 25, 

30 more preferably about 50, 75, 100, 125, 150, 175, 200, 

250, 300, 350 or 400 consecutive nucleotides of the sense 
or anti-sense sequence of any of SEQ ID NOs:l-22, 34-43, 

and - or the cDNA of a clone deposited as ATCC 

98899, 98900, and 989001 or of a naturally occurring 

35 mutant of any of SEQ NOs:l-22, 34-43, and - or 
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the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001. 

Probes based on the sequence of a nucleic acid molecule 
of the invention can be used to detect transcripts or 
5 genomic sequences encoding the same protein molecule 
encoded by a selected nucleic acid molecule. The probe 
comprises a label group attached thereto, e.g., a 
radioisotope, a fluorescent compound, an enzyme, or an 
enzyme co- factor. Such probes can be used as part of a 

10 diagnostic test kit for identifying cells or tissues 
which mis-express the protein, such as by measuring 
levels of a nucleic acid molecule encoding the protein in 
a sample of cells from a subject, e.g., detecting mRNA 
levels or determining whether a gene encoding the protein 

15 has been mutated or deleted. 

A nucleic acid fragment encoding a ''biologically active 
portion" of a polypeptide of the invention can be 
prepared by isolating a portion of any of SEQ ID NOs:l- 
22, 34-43, and - or the nucleotide sequence of 

20 the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001 which encodes a polypeptide having a biological 
activity, expressing the encoded portion of the 
polypeptide protein (e.g., by recombinant expression in 
vitro) and assessing the activity of the encoded portion 

25 of the polypeptide. 

The invention further encompasses nucleic acid 
molecules that differ from the nucleotide sequence of SEQ 

ID NOs:l-22, 34-43, and - or the cDNA of a clone 

of ATCC 98899, 98900, and 989001 due to degeneracy of the 

3 0 genetic code and thus encode the same protein as that 

encoded by the nucleotide sequence shown in any of SEQ ID 

NOs:l-22, 34-43 , and - or the cDNA of a clone 

deposited as ATCC 98899, 98900, and 989001. 

In addition to the nucleotide sequences shown in SEQ ID 

35 NOs:l-22, 34-43, and - and present in cDNA' s of 
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the clones deposited of ATCC 98899, 98900, and 989001, it 
will be appreciated by those skilled in the art that DNA 
sequence polymorphisms that lead to changes in the amino 
acid sequence may exist within a population (e.g., the 
5 human population) . Such genetic polymorphisms may exist 
among individuals within a population due to natural 
allelic variation. An allele is one of a group of genes 
which occur alternatively at a given genetic locus. As 
used herein, the phrase "allelic variant" refers to a 

10 nucleotide sequence which occurs at a given locus or to a 
polypeptide encoded by the nucleotide sequence. As used 
herein, the terms "gene" and "recombinant gene" refer to 
nucleic acid molecules comprising an open reading frame 
encoding a polypeptide of the invention. Such natural 

15 allelic variations can typically result in 1-5% variance 
in the nucleotide sequence of a given gene. Alternative 
alleles can be identified by sequencing the gene of 
interest in a number of different individuals. This can 
be readily carried out by using hybridization probes to 

20 identify the same genetic locus in a variety of 

individuals. Any and all such nucleotide variations and 
resulting amino acid polymorphisms or variations that are 
the result of natural allelic variation and that do not 
alter the functional activity are intended to be within 

25 the scope of the invention. 

Moreover, nucleic acid molecules encoding proteins of 
the invention from other species (homologues) , which have 
a nucleotide sequence which differs from that of the 
human protein described herein are intended to be within 

30 the scope of the invention. Nucleic acid molecules 

corresponding to natural allelic variants and homologues 
of a cDNA of the invention can be isolated based on their 
identity to the human nucleic acid molecule disclosed 
herein using the human cDNAs, or a portion thereof, as a 

35 hybridization probe according to standard hybridization 
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techniques under stringent hybridization conditions. For 
example, a cDNA encoding a soluble form of a membrane- 
bound protein of the invention isolated based on its 
hybridization to a nucleic acid molecule encoding all or 
5 part of the membrane -bound form. Likewise, a cDNA 
encoding a membrane -bound form can be isolated based on 
its hybridization to a nucleic acid molecule encoding all 
or part of the soluble form. 

Accordingly, in another embodiment, an isolated nucleic 

10 acid molecule of the invention is at least 300 (325, 350, 
375, 400, 425, 450, 500, 550, 600, 650, 700, 800, 900, 
1000, or 1290) nucleotides in length and hybridizes under 
stringent conditions to the nucleic acid molecule 
'comprising the nucleotide sequence, preferably the coding 

15 sequence, of any of SEQ ID NOs:l-22, 34-43, and - 

the cDNA of a clone deposited as ATCC 98899, 98900, and 
989001, or a complement thereof. 

As used herein, the term "hybridizes under stringent 
conditions" is intended to describe conditions for 

20 hybridization and washing under which nucleotide 
sequences at least 60% (65%, 70%, preferably 75%) 
identical to each other typically remain hybridized to 
each other. Such stringent conditions are known to those 
skilled in the art and can be found in Current Protocols 

25 in Molecular Biology, John Wiley & Sons, N.Y. (1989), 
6.3.1-6.3.6. A preferred, non-limiting example of 
stringent hybridization conditions are hybridization in 
6X sodium chloride/sodium citrate (SSC) at about 45°C, 
followed by one or more washes in 0.2 X SSC, 0.1% SDS at 

30 50-65°C. Preferably, an isolated nucleic acid molecule 
of the invention that hybridizes under stringent 
conditions to the sequence of any of SEQ ID NOs:l-22, 34- 

43, and - , the cDNA of ATCC 98899, 98900, and 

989001, or the complement thereof, corresponds to a 

35 naturally-occurring nucleic acid molecule. As used 
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herein, a "naturally-occurring" nucleic acid molecule 
refers to an RNA or DNA molecule having a nucleotide 
sequence that occurs in nature (e.g., encodes a natural 
protein) . 

5 In addition to naturally-occurring allelic variants of 
a nucleic acid molecule of the invention sequence that 
may exist in the population, the skilled artisan will 
further appreciate that changes can be introduced by 
mutation thereby leading to changes in the amino acid 

10 sequence of the encoded protein, without altering the 

biological activity of the protein. For example, one can 
make nucleotide substitutions leading to amino acid 
substitutions at "non-essential" amino acid residues. A 
"non-essential" amino acid residue is a residue that can 

15 be altered from the wild- type sequence without altering 
the biological activity, whereas an "essential" amino 
acid residue is required for biological activity. For 
example, amino acid residues that are not conserved or 
only semi -conserved among homologues of various species 

2 0 may be non-essential for activity and thus would be 

likely targets for alteration. Alternatively, amino acid 
residues that are conserved among the homologues of 
various species (e.g., murine and human) may be essential 
for activity and thus would not be likely targets for 
25 alteration. Conserved cysteine residues are particularly 
important and are preferably retained in functional 
variants 

Accordingly, another aspect of the invention 
pertains to nucleic acid molecules encoding a polypeptide 

3 0 of the invention that contain changes in amino acid 

residues that are not essential for activity. Such 
polypeptides differ in amino acid sequence from SEQ ID 

NOs:23-33, 54-63, and - yet retain biological 

activity. In one embodiment, the isolated nucleic acid 
35 molecule includes a nucleotide sequence encoding a 
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protein that includes an amino acid sequence that is at 
least about 45% identical, 65%, 75%, 85%, 95%, or 98% 
identical to the amino acid sequence of any of SEQ ID 

Nos:23-3, 54-63, and - . 

5 An isolated nucleic acid molecule encoding a variant 
protein can be created by introducing one or more 
nucleotide substitutions, additions or deletions into the 
nucleotide sequence of SEQ ID NOs:l-22, 34-43, and _ - 
the cDNA of a clone deposited of ATCC 98899, 98900, 

10 and 989001 such that one or more amino acid 

substitutions, additions or deletions are introduced into 
the encoded protein. Mutations can be introduced by 
standard techniques, such as site-directed mutagenesis 
and PCR-mediated mutagenesis. Preferably, conservative 

15 amino acid substitutions are made at one or more 
predicted non-essential amino acid residues. A 
"conservative amino acid substitution" is one in which 
the amino acid residue is replaced with an amino acid 
residue having a similar side chain. Families of amino 

20 acid residues having similar side chains have been 

defined in the art. These families include amino acids 
with basic side chains (e.g., lysine, arginine, 
histidine) , acidic side chains (e.g., aspartic acid, 
glutamic acid), uncharged polar side chains (e.g., 

25 glycine, asparagine, glutamine, serine, threonine, 

tyrosine, cysteine), nonpolar side chains (e.g., alanine, 
valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan), beta-branched side chains (e.g., 
threonine, valine, isoleucine) and aromatic side chains 

30 (e.g., tyrosine, phenylalanine, tryptophan, histidine). 
Alternatively, mutations can be introduced randomly along 
all or part of the coding sequence, such as by 
saturation mutagenesis, and the resultant mutants can be 
screened for biological activity to identify mutants that 

35 retain activity. Following mutagenesis, the encoded 
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protein can be expressed recombinant ly and the activity 
of the protein can be determined. 

In a preferred embodiment, a mutant polypeptide that is 
a variant of a polypeptide of the invention can be 
5 assayed for: (1) the ability to form protein: protein 

interactions with proteins in a signalling pathway of the 
polypeptide of the invention; (2) the ability to bind a 
ligand of the polypeptide of the invention; or (3) the 
ability to bind to an intracellular target protein of the 

10 polypeptide of the invention. In yet another preferred 
embodiment, the mutant polypeptide can be assayed for the 
ability to modulate cellular proliferation or cellular 
differentiation. 

The present invention encompasses antisense nucleic 

15 acid molecules, i.e., molecules which are complementary 
to a sense nucleic acid encoding a polypeptide of the 
invention, e.g., complementary to the coding strand of a 
double -stranded cDNA molecule or complementary to an mRNA 
sequence. Accordingly, an antisense nucleic acid can 

20 hydrogen bond to a sense nucleic acid. The antisense 
nucleic acid can be complementary to an entire coding 
strand, or to only a portion thereof, e.g., all or part 
of the protein coding region (or open reading frame) . An 
antisense nucleic acid molecule can be antisense to all 

25 or part of a noncoding region of the coding strand of a 
nucleotide sequence encoding a polypeptide of the 
invention. The noncoding regions ("5' and 3' 
untranslated regions") are the 5' and 3' sequences which 
flank the coding region and are not translated into amino 

3 0 acids. 

An antisense oligonucleotide can be, for example, 
about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides 
in length. An antisense nucleic acid of the invention 
can be constructed using chemical synthesis and enzymatic 
35 ligation reactions using procedures known in the art. 
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For example, an antisense nucleic acid (e.g., an 
antisense oligonucleotide) can be chemically synthesized 
using naturally occurring nucleotides or variously 
modified nucleotides designed to increase the biological 
5 stability of the molecules or to increase the physical 
stability of the duplex formed between the antisense and 
sense nucleic acids, e.g., phosphorothioate derivatives 
and acridine substituted nucleotides can be used. 
Examples of modified nucleotides which can be used to 

10 generate the antisense nucleic acid include 5- 
f luorouracil , 5-bromouracil , 5-chlorouracil , 5- 
iodouracil, hypoxanthine , xanthine, 4-acetylcytosine, 5- 
(carboxyhydroxylmethyl) uracil, 5- 
carboxymethylaminomethyl - 2 - t hiouridine , 5 - 

15 carboxymethylaminomethyluracil , dihydrouracil , beta-D- 
galactosylqueosine, inosine, N6-isopentenyl adenine, 1- 
methylguanine, 1-methylinosine, 2 , 2 -dimethyl guanine, 2- 
methyladenine , 2 -methylguanine , 3 -methylcytosine , 5 - 
methyl cytosine, N6 -adenine, 7 -methylguanine, 5- 

20 methylami nome thyl uracil, 5-met hoxy ami nome t hy 1 - 2 - 
thiouracil , beta-D-mannosylqueosine , 5 ' - 
methoxycarboxymethyluracil , 5-methoxyuracil , 2- 
methyl thio-N6 - isopentenyladenine , uracil - 5 -oxyacet ic acid 
(v) , wybutoxosine, pseudouracil , queosine, 2- 

25 thiocytosine, 5 -methyl -2- thiouracil , 2 -thiouracil , 4- 
thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v) , 5-methyl-2- 
thiouracil, 3- (3-amino-3-N-2-carboxypropyl) uracil, 
(acp3)w, and 2 , 6-diaminopurine . Alternatively, the 

30 antisense nucleic acid can be produced biologically using 
an expression vector into which a nucleic acid has been 
subcloned in an antisense orientation (i.e., RNA 
transcribed from the inserted nucleic acid will be of an 
antisense orientation to a target nucleic acid of 

35 interest, described further in the following subsection) . 
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The antisense nucleic acid molecules of the invention 
are typically administered to a subject or generated in 
situ such that they hybridize with or bind to cellular 
mRNA and/or genomic DNA encoding a selected polypeptide 
5 of the invention to thereby inhibit expression, e.g., by 
inhibiting transcription and/or translation. The 
hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, 
in the case of an antisense nucleic acid molecule which 

10 binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a 
route of administration of antisense nucleic acid 
molecules of the invention includes direct injection at a 
tissue site. Alternatively, antisense nucleic acid 

15 molecules can be modified to target selected cells and 
then administered systemically . For example, for 
systemic administration, antisense molecules can be 
modified such that they specifically bind to receptors or 
antigens expressed on a selected cell surface, e.g., by 

20 linking the antisense nucleic acid molecules to peptides 
or antibodies which bind to cell surface receptors or 
antigens. The antisense nucleic acid molecules can also 
be delivered to cells using the vectors described herein. 
To achieve sufficient intracellular concentrations of the 

25 antisense molecules, vector constructs in which the 
antisense nucleic acid molecule is placed under the 
control of a strong pol II or pol III promoter are 
preferred. 

An antisense nucleic acid molecule of the invention can 
30 be an a-anomeric nucleic acid molecule. An a-anomeric 
nucleic acid molecule forms specific double- stranded 
hybrids with complementary RNA in which, contrary to the 
usual ]8-units, the strands run parallel to each other 
(Gaultier et al . (1987) Nucleic Acids Res. 15:6625-6641). 
35 The antisense nucleic acid molecule can also comprise a 
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2' -o-methylribonucleotide (Inoue et al. (1987) Nucleic 
Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue 
(Inoue et al . (1987) FEBS Lett. 215:327-330). 

The invention also encompasses ribozymes. Ribozymes 
5 are catalytic RNA molecules with ribonuclease activity 
which are capable of cleaving a single -stranded nucleic 
acid, such as an mRNA, to which they have a complementary 
region. Thus, ribozymes (e.g., hammerhead ribozymes 
(described in Haselhoff and Gerlach (1988) Nature 

10 334:585-591)) can be used to catalytically cleave mRNA 

transcripts to thereby inhibit translation of the protein 
encoded by the mRNA. A ribozyme having specificity for a 
nucleic acid molecule encoding a polypeptide of the 
invention can be designed based upon the nucleotide 

15 sequence of a cDNA disclosed herein. For example, a 
derivative of a Tetrahymena L-19 IVS RNA can be 
constructed in which the nucleotide sequence of the 
active site is complementary to the nucleotide sequence 
to be cleaved in a Cech et al . U.S. Patent No. 4,987,071; 

20 and Cech et al . U.S. Patent No. 5,116,742. 

Alternatively, an mRNA encoding a polypeptide of the 
invention can be used to select a catalytic RNA having a 
specific ribonuclease activity from a pool of RNA 
molecules. See, e.g., Bartel and Szostak (1993) Science 

25 261:1411-1418. 

The invention also encompasses nucleic acid molecules 
which form triple helical structures. For example, 
expression of a polypeptide of the invention can be 
inhibited by targeting nucleotide sequences complementary 

30 to the regulatory region of the gene encoding the 

polypeptide (e.g., the promoter and/or enhancer) to form 
triple helical structures that prevent transcription of 
the gene in target cells. See generally Helene (1991) 
Anticancer Drug Des. 6 (6) :569-84; Helene (1992) Ann. N.Y. 
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Acad. Sci. 660:27-36; and Maher (1992) Bioassays 
14 (12) :807-15. 

In preferred embodiments, the nucleic acid molecules of 
the invention can be modified at the base moiety, sugar 
5 moiety or phosphate backbone to improve, e.g., the 

stability, hybridization, or solubility of the molecule. 
For example, the deoxyribose phosphate backbone of the 
nucleic acids can be modified to generate peptide nucleic 
acids (see Hyrup et al . (1996) Bioorganic & Medicinal 

10 Chemistry 4(1) : 5-23) . As used herein, the terms 

"peptide nucleic acids" or "PNAs" refer to nucleic acid 
mimics, e.g., DNA mimics, in which the deoxyribose 
phosphate backbone is replaced by a pseudopeptide 
backbone and only the four natural nucleobases are 

15 retained. The neutral backbone of PNAs has been shown to 
allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA 
oligomers can be performed using standard solid phase 
peptide synthesis protocols as described in Hyrup et al . 

20 (1996), supra; Perry-O' Keef e et al . (1996) Proc. Natl. 
Acad. Sci. USA 93: 14670-675. 

PNAs can be used in therapeutic and diagnostic 
applications. For example, PNAs can be used as antisense 
or antigene agents for sequence- specif ic modulation of 

25 gene expression by, e.g., inducing transcription or 

translation arrest or inhibiting replication. PNAs can 
also be used, e.g., in the analysis of single base pair 
mutations in a gene by, e.g., PNA directed PCR clamping; 
as artificial restriction enzymes when used in 

30 combination with other enzymes, e.g., SI nucleases (Hyrup 
(1996) , supra; or as probes or primers for DNA sequence 
and hybridization (Hyrup (1996), supra; Perry-O' Keef e et 
al. (1996) Proc. Natl. Acad. Sci. USA 93: 14670-675). 
In another embodiment, PNAs can be modified, e.g., to 

35 enhance their stability or cellular uptake, by attaching 
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lipophilic or other helper groups to PNA, by the 
formation of PNA-DNA chimeras, or by the use of liposomes 
or other techniques of drug delivery known in the art. 
For example, PNA-DNA chimeras can be generated which may 
5 combine the advantageous properties of PNA and DNA. Such 
chimeras allow DNA recognition enzymes, e.g., RNAse H and 
DNA polymerases, to interact with the DNA portion while 
the PNA portion would provide high binding affinity and 
specificity. PNA-DNA chimeras can be linked using 

10 linkers of appropriate lengths selected in terms of base 
stacking, number of bonds between the nucleobases, and 
orientation (Hyrup (1996) , supra) . The synthesis of PNA- 
DNA chimeras can be performed as described in Hyrup 
(1996), supra, and Finn et al . (1996) Nucleic Acids Res. 

15 24 (17) :3357-63 . For example, a DNA chain can be 
synthesized on a solid support using standard 
phosphoramidite coupling chemistry and modified 
nucleoside analogs. Compounds such as 5' -(4- 
methoxytrityl) amino-5' -deoxy- thymidine phosphoramidite 

2 0 can be used as a link between the PNA and the 5' end of 
DNA (Mag et al . (1989) Nucleic Acids Res. 17:5973-88). 
PNA monomers are then coupled in a stepwise manner to 
produce a chimeric molecule with a 5' PNA segment and a 
3' DNA segment (Finn et al . (1996) Nucleic Acids Res. 

25 24 (17) :3357-63) . Alternatively, chimeric molecules can 
be synthesized with a 5' DNA segment and a 3' PNA segment 
(Peterser et al . (1975) Bioorganic Med. Chem. Lett. 
5:1119-11124) . 

In other embodiments, the oligonucleotide may include 

30 other appended groups such as peptides (e.g., for 
targeting host cell receptors in vivo) , or agents 
facilitating transport across the cell membrane (see, 
e.g., Letsinger et al . (1989) Proc. Natl. Acad. Sci. USA 
86:6553-6556; Lemaitre et al . (1987) Proc. Natl. Acad. 

35 Sci. USA 84:648-652; PCT Publication No. W0 88/09810) or 
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the blood-brain barrier (see, e.g., PCT Publication No. 
WO 89/10134) . In addition, oligonucleotides can be 
modified with hybridization-triggered cleavage agents 
(see, e.g., Krol et al . (1988) Bio/Techniques 6:958-976) 
5 or intercalating agents (see, e.g., Zon (1988) Pharm. 

Res. 5:539-549). To this end, the oligonucleotide may be 
conjugated to another molecule, e.g., a peptide, 
hybridization triggered cross-linking agent, transport 
agent, hybridization- triggered cleavage agent, etc. 

10 II. Isolated Proteins and Antibodies 

One aspect of the invention pertains to isolated 
proteins, and biologically active portions thereof, as 
well as polypeptide fragments suitable for use as 
immunogens to raise antibodies directed against a 

15 polypeptide of the invention. In one embodiment, the 
native polypeptide can be isolated from cells or tissue 
sources by an appropriate purification scheme using 
standard protein purification techniques. In another 
embodiment, polypeptides of the invention are produced by 

20 recombinant DNA techniques. Alternative to recombinant 
expression, a polypeptide of the invention can be 
synthesized chemically using standard peptide synthesis 
techniques . 

An "isolated" or "purified" protein or biologically 
25 active portion thereof is substantially free of cellular 
material or other contaminating proteins from the cell or 
tissue source from which the protein is derived, or 
substantially free of chemical precursors or other 
chemicals when chemically synthesized. The language 
30 "substantially free of cellular material" includes 

preparations of protein in which the protein is separated 
from cellular components of the cells from which it is 
isolated or recombinantly produced. Thus, protein that 
is substantially free of cellular material includes 
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preparations of protein having less than about 3 0%, 20%, 
10%, or 5% (by dry weight) of heterologous protein (also 
referred to herein as a "contaminating protein") . When 
the protein or biologically active portion thereof is 
5 recombinant ly produced, it is also preferably 

substantially free of culture medium, i.e., culture 
medium represents less than about 20%, 10%, or 5% of the 
volume of the protein preparation. When the protein is 
produced by chemical synthesis, it is preferably 

10 substantially free of chemical precursors or other 

chemicals, i.e., it is separated from chemical precursors 
or other chemicals which are involved in the synthesis of 
the protein. Accordingly such preparations of the 
protein have less than about 30%, 20%, 10%, 5% (by dry 

15 weight) of chemical precursors or compounds other than 
the polypeptide of interest. 

Biologically active portions of a polypeptide of the 
invention include polypeptides comprising amino acid 
sequences sufficiently identical to or derived from the 

20 amino acid sequence of the protein (e.g., the amino acid 
sequence shown in any of SEQ ID Nos:23-33, 54-63, and _ 

- which include fewer amino acids than the full length 

protein, and exhibit at least one activity of the 
corresponding full-length protein. Typically, 

25 biologically active portions comprise a domain or motif 
with at least one activity of the corresponding protein. 
A biologically active portion of a protein of the 
invention can be a polypeptide which is, for example, 10, 
25, 50, 100 or more amino acids in length. Moreover, 

30 other biologically active portions, in which other 

regions of the protein are deleted, can be prepared by 
recombinant techniques and evaluated for one or more of 
the functional activities of the native form of a 
polypeptide of the invention. 
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Preferred polypeptides have the amino acid sequence of 

any of SEQ ID Nos:23-33, 54-63, and - . Other 

useful proteins are substantially identical (e.g., at 
least about 45%, preferably 55%, 65%, 75%, 85%, 95%, or 

5 99%) to any of SEQ ID Nos : 22-33, 54-63, and - and 

retain the functional activity of the protein of the 
corresponding naturally-occurring protein yet differ in 
amino acid sequence due to natural allelic variation or 
mutagenesis . 

10 To determine the percent identity of two amino acid 
sequences or of two nucleic acids, the sequences are 
aligned for optimal comparison purposes (e.g., gaps can 
be introduced in the sequence of a first amino acid or 
nucleic acid sequence for optimal alignment with a second 

15 amino or nucleic acid sequence) . The amino acid residues 
or nucleotides at corresponding amino acid positions or 
nucleotide positions are then compared. When a position 
in the first sequence is occupied by the same amino acid- 
residue or nucleotide as the corresponding position in 

20 the second sequence, then the molecules are identical at 
that position. The percent identity between the two 
sequences is a function of the number of identical 
positions shared by the sequences (i.e., % identity = # 
of identical positions/total # of positions (e.g., 

25 overlapping positions) x 100) . Preferably, the two 
sequences are the same length. 

The determination of percent homology between two 
sequences can be accomplished using a mathematical 
algorithm. A preferred, non-limiting example of a 

30 mathematical algorithm utilized for the comparison of two 
sequences is the algorithm of Karlin and Altschul (1990) 
Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in 
Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 
90:5873-5877. Such an algorithm is incorporated into the 

35 NBLAST and XBLAST programs of Altschul, et al . (1990) J. 
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Mol. Biol. 215:403-410. BLAST nucleotide searches can be 
performed with the NBLAST program, score = 100, 
wordlength = 12 to obtain nucleotide sequences homologous 
to a nucleic acid molecules of the invention. BLAST 
5 protein searches can be performed with the XBLAST 

program, score = 50, wordlength = 3 to obtain amino acid 
sequences homologous to a protein molecules of the 
invention. To obtain gapped alignments for comparison 
purposes, Gapped BLAST can be utilized as described in 

10 Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402. 
Alternatively, PSI -Blast can be used to perform an 
iterated search which detects distant relationships 
between molecules. Id. When utilizing BLAST, Gapped 
BLAST, and PSI -Blast programs, the default parameters of 

15 the respective programs (e.g., XBLAST and NBLAST) can be 
used. See http://www.ncbi.nlm.nih.gov. Another 
preferred, non-limiting example of a mathematical 
algorithm utilized for the comparison of sequences is the 
algorithm of Myers and Miller, (1988) CABIOS 4:11-17. 

20 Such an algorithm is incorporated into the ALIGN program 
(version 2.0) which is part of the GCG sequence alignment 
software package. When utilizing the ALIGN program for 
comparing amino acid sequences, a PAM12 0 weight residue 
table, a gap length penalty of 12, and a gap penalty of 4 

25 can be used. 

The percent identity between two sequences can be 
determined using techniques similar to those described 
above, with or without allowing gaps. In calculating 
percent identity, only exact matches are counted. 

30 The invention also provides chimeric or fusion 
proteins. As used herein, a "chimeric protein" or 
"fusion protein" comprises all or part (preferably 
biologically active) of a polypeptide of the invention 
operably linked to a heterologous polypeptide (i.e., a 

35 polypeptide other than the same polypeptide of the 
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invention) . Within the fusion protein, the term 
"operably linked" is intended to indicate that the 
polypeptide of the invention and the heterologous 
polypeptide are fused in- frame to each other. The 
5 heterologous polypeptide can be fused to the N- terminus 
or C- terminus of the polypeptide of the invention. 

One useful fusion protein is a GST fusion protein in 
which the polypeptide of the invention is fused to the C- 
terminus of GST sequences. Such fusion proteins can 

10 facilitate the purification of a recombinant polypeptide 
of the invention. 

In another embodiment, the fusion protein contains a 
heterologous signal sequence at its N-terminus . For 
example, the native signal sequence of a polypeptide of 

15 the invention can be removed and replaced with a signal 
sequence from another protein. For example, the gp67 
secretory sequence of the baculovirus envelope protein 
can be used as a heterologous signal sequence {Current 
Protocols in Molecular Biology, Ausubel et al . , eds., 

20 John Wiley & Sons, 1992) . Other examples of eukaryotic 
heterologous signal sequences include the secretory 
sequences of melittin and human placental alkaline 
phosphatase (Stratagene; La Jolla, California) . In yet 
another example, useful prokaryotic heterologous signal 

25 sequences include the ptioA secretory signal (Sambrook et 
al . , supra) and the protein A secretory signal (Pharmacia 
Biotech; Piscataway, New Jersey) . 

In yet another embodiment, the fusion protein is an 
immunoglobulin fusion protein in which all or part of a 

30 polypeptide of the invention is fused to sequences 
derived from a member of the immunoglobulin protein 
family. The immunoglobulin fusion proteins of the 
invention can be incorporated into pharmaceutical 
compositions and administered to a subject to inhibit an 

35 interaction between a ligand (soluble or membrane -bound) 
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and a protein on the surface of a cell (receptor) , to 
thereby suppress signal transduction in vivo. The 
immunoglobulin fusion protein can be used to affect the 
bioavailability of a cognate ligand of a polypeptide of 
5 the invention. Inhibition of ligand/receptor interaction 
may be useful therapeutically, both for treating 
proliferative and dif f erentiative disorders and for 
modulating (e.g. promoting or inhibiting) cell survival. 
Moreover, the immunoglobulin fusion proteins of the 

10 invention can be used as immunogens to produce antibodies 
directed against a polypeptide of the invention in a 
subject, to purify ligands and in screening assays to 
identify molecules which inhibit the interaction of 
receptors with ligands. 

15 Chimeric and fusion protein of the invention can be 
produced by standard recombinant DNA techniques. In 
another embodiment, the fusion gene can be synthesized by 
conventional techniques including automated DNA 
synthesizers. Alternatively, PCR amplification of gene 

20 fragments can be carried out using anchor primers which 
give rise to complementary overhangs between two 
consecutive gene fragments which can subsequently be 
annealed and reamplified to generate a chimeric gene 
sequence (see, e.g., Ausubel et al . , supra). Moreover, 

25 many expression vectors are commercially available that 
already encode a fusion moiety (e.g., a GST polypeptide) . 
A nucleic acid encoding a polypeptide of the invention 
can be cloned into such an expression vector such that 
the fusion moiety is linked in- frame to the polypeptide 

30 of the invention. 

A signal sequence of a polypeptide of the invention 
(SEQ ID NOs: 64-75) can be used to facilitate secretion 
and isolation of the secreted protein or other proteins 
of interest. Signal sequences are typically 

35 characterized by a core of hydrophobic amino acids which 
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are generally cleaved from the mature protein during 
secretion in one or more cleavage events. Such signal 
peptides contain processing sites that allow cleavage of 
the signal sequence from the mature proteins as they pass 
5 through the secretory pathway. Thus, the invention 
pertains to the described polypeptides having a signal 
sequence, as well as to the signal sequence itself and to 
the polypeptide in the absence of the signal sequence 
(i.e., the cleavage products). In one embodiment, a 

10 nucleic acid sequence encoding a signal sequence of the 
invention can be operably linked in an expression vector 
to a protein of interest, such as a protein which is 
ordinarily not secreted or is otherwise difficult to 
isolate. The signal sequence directs secretion of the 

15 protein, such as from a eukaryotic host into which the 
expression vector is transformed, and the signal sequence 
is subsequently or concurrently cleaved. The protein can 
then be readily purified from the extracellular medium by 
art recognized methods. Alternatively, the signal 

20 sequence can be linked to the protein of interest using a 
sequence which facilitates purification, such as with a 
GST domain. 

In another embodiment, the signal sequences of the 
present invention can be used to identify regulatory 

25 sequences, e.g., promoters, enhancers, repressors. Since 
signal sequences are the most amino -terminal sequences of 
a peptide, it is expected that the nucleic acids which 
flank the signal sequence on its amino- terminal side will 
be regulatory sequences which affect transcription. 

3 0 Thus, a nucleotide sequence which encodes all or a 

portion of a signal sequence can be used as a probe to 
identify and isolate signal sequences and their flanking 
regions, and these flanking regions can be studied to 
identify regulatory elements therein. 
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The present invention also pertains to variants of the 
polypeptides of the invention. Such variants have an 
altered amino acid sequence which can function as either 
agonists (mimetics) or as antagonists. Variants can be 
5 generated by mutagenesis, e.g., discrete point mutation 
or truncation. An agonist can retain substantially the 
same, or a subset, of the biological activities of the 
naturally occurring form of the protein. An antagonist 
of a protein can inhibit one or more of the activities of 

10 the naturally occurring form of the protein by, for 
example, competitively binding to a downstream or 
upstream member of a cellular signaling cascade which 
includes the protein of interest. Thus, specific 
biological effects can be elicited by treatment with a 

15 variant of limited function. Treatment of a subject with 
a variant having a subset of the biological activities of 
the naturally occurring form of the protein can have 
fewer side effects in a subject relative to treatment 
with the naturally occurring form of the protein. 

20 Variants of a protein of the invention which function 
as either agonists (mimetics) or as antagonists can be 
identified by screening combinatorial libraries of 
mutants, e.g., truncation mutants, of the protein of the 
invention for agonist or antagonist activity. In one 

25 embodiment, a variegated library of variants is generated 
by combinatorial mutagenesis at the nucleic acid level 
and is encoded by a variegated gene library. A 
variegated library of variants can be produced by, for 
example, enzymatically ligating a mixture of synthetic 

30 oligonucleotides into gene sequences such that a 
degenerate set of potential protein sequences is 
expressible as individual polypeptides, or alternatively, 
as a set of larger fusion proteins (e.g., for phage 
display) . There are a variety of methods which can be 

35 used to produce libraries of potential variants of the 
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oligonucleotide sequence. Methods for synthesizing 
degenerate oligonucleotides are known in the art (see, 
e.g., Narang (1983) Tetrahedron 39:3; Itakura et al. 
5 (1984) Annu. Rev. Biochem. 53:323; Itakura et al . (1984) 
Science 198:1056; Ike et al . (1983) Nucleic Acid Res. 
11:477) . 

In addition, libraries of fragments of the coding 
sequence of a polypeptide of the invention can be used to 

10 generate a variegated population of polypeptides for 
screening and subsequent selection of variants. For 
example, a library of coding sequence fragments can be 
generated by treating a double stranded PCR fragment of 
the coding sequence of interest with a nuclease under 

15 conditions wherein nicking occurs only about once per 
molecule, denaturing the double stranded DNA, renaturing 
the DNA to form double stranded DNA which can include 
sense/ant isense pairs from different nicked products, 
removing single stranded portions from reformed duplexes 

2 0 by treatment with SI nuclease, and ligating the resulting 
fragment library into an expression vector. By this 
method, an expression library can be derived which 
encodes N-terminal and internal fragments of various 
sizes of the protein of interest. 

25 Several techniques are known in the art for screening 
gene products of combinatorial libraries made by point 
mutations or truncation, and for screening cDNA libraries 
for gene products having a selected property. The most 
widely used techniques, which are amenable to high 

30 through-put analysis, for screening large gene libraries 
typically include cloning the gene library into 
replicable expression vectors, transforming appropriate 
cells with the resulting library of vectors, and 
expressing the combinatorial genes under conditions in 

35 which detection of a desired activity facilitates 
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isolation of the vector encoding the gene whose product 
was detected. Recursive ensemble mutagenesis (REM) , a 
technique which enhances the frequency of functional 
mutants in the libraries, can be used in combination with 
5 the screening assays to identify variants of a protein of 
the invention (Arkin and Yourvan (1992) Proc. Natl. Acad. 
Sci. USA 89:7811-7815; Delgrave et al . (1993) Protein 
Engineering 6(3) : 327-331) . 

An isolated polypeptide of the invention, or a fragment 

10 thereof, can be used as an immunogen to generate 

antibodies using standard techniques for polyclonal and 
monoclonal antibody preparation. The full-length 
polypeptide or protein can be used or, alternatively, the 
invention provides antigenic peptide fragments for use as 

15 immunogens. The antigenic peptide of a protein of the 

invention comprises at least 8 (preferably 10, 15, 20, or 
30) amino acid residues of the amino acid sequence shown 

in any of SEQ ID Nos:23-33, 54-64, and - and 

encompasses an epitope of the protein such that an 

20 antibody raised against the peptide forms a specific 
immune complex with the protein. 

Preferred epitopes encompassed by the antigenic peptide 
are regions that are located on the surface of the 
protein, e.g., hydrophilic regions, rather than 

25 hydrophobic regions, e.g., transmembrane domains. The 
hydrophilicity of a protein sequence can be easily 
determined using readily available programs. 

An immunogen typically is used to prepare antibodies by 
immunizing a suitable subject, (e.g., rabbit, goat, mouse 

30 or other mammal) . An appropriate immunogenic preparation 
can contain, for example, recombinant ly expressed 
chemically synthesized polypeptide. The preparation can 
further include an adjuvant, such as Freund's complete or 
incomplete adjuvant, or similar immunostimulatory agent. 



r ' 

r 

WO 00/18904 PCT/US99/22817 

- 72 - 

Accordingly, another aspect of the invention pertains 
to antibodies directed against a polypeptide of the 
invention. The term "antibody" as used herein refers to 
immunoglobulin molecules and immunologically active 
5 portions of immunoglobulin molecules, i.e., molecules 
that contain an antigen binding site which specifically 
binds an antigen, such as a polypeptide of the invention. 
A molecule which specifically binds to a given 
polypeptide of the invention is a molecule which binds 

10 the polypeptide, but does not substantially bind other 
molecules in a sample, e.g., a biological sample, which 
naturally contains the polypeptide. Examples of 
immunologically active portions of immunoglobulin 
molecules include F(ab) and F(ab') 2 fragments which can be 

15 generated by treating the antibody with an enzyme such as 
pepsin. The invention provides polyclonal and monoclonal 
antibodies. The term "monoclonal antibody" or 
"monoclonal antibody composition", as used herein, refers 
to a population of antibody molecules that contain only 

20 one species of an antigen binding site capable of 
immunoreacting with a particular epitope. 

Polyclonal antibodies can be prepared as described 
above by immunizing a suitable subject with a polypeptide 
of the invention as an immunogen. The antibody titer in 

25 the immunized subject can be monitored over time by 
standard techniques, such as with an enzyme linked 
immunosorbent assay (ELISA) using immobilized 
polypeptide. If desired, the antibody molecules can be 
isolated from the mammal (e.g., from the blood) and 

3 0 further purified by well-known techniques, such as 

protein A chromatography to obtain the IgG fraction. At 
an appropriate time after immunization, e.g., when the 
specific antibody titers are highest, antibody-producing 
cells can be obtained from the subject and used to 

35 prepare monoclonal antibodies by standard techniques, 
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such as the hybridoma technique originally described by 
Kohler and Milstein (1975) Nature 256:495-497, the human 
B cell hybridoma technique (Kozbor et al . (1983) Immunol . 
Today 4:72), the EBV-hybridoma technique (Cole et al . 
5 (1985) , Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc., pp. 77-96) or trioma techniques. The 
technology for producing hybridomas is well known (see 
generally Current Protocols in Immunology (1994) Coligan 
et al. (eds.) John Wiley & Sons, Inc., New York, NY). 

10 Hybridoma cells producing a monoclonal antibody of the 

invention are detected by screening the hybridoma culture 
supernatants for antibodies that bind the polypeptide of 
interest, e.g., using a standard ELISA assay. 

Alternative to preparing monoclonal antibody- secreting 

15 hybridomas, a monoclonal antibody directed against a 
polypeptide of the invention can be identified and 
isolated by screening a recombinant combinatorial 
immunoglobulin library (e.g., an antibody phage display 
library) with the polypeptide of interest. Kits for 

20 generating and screening phage display libraries are 

commercially available (e.g., the Pharmacia Recombinant 
Phage Antibody System, Catalog No. 27-9400-01; and the 
Stratagene SurfZAP™ Phage Display Kit, Catalog No. 
240612) . Additionally, examples of methods and reagents 

25 particularly amenable for use in generating and screening 
antibody display library can be found in, for example, 
U.S. Patent No. 5,223,409; PCT Publication No. WO 
92/18619; PCT Publication No. WO 91/17271; PCT 
Publication No. WO 92/20791; PCT Publication No. WO 

30 92/15679; PCT Publication No. WO 93/01288; PCT 

Publication No. WO 92/01047; PCT Publication No. WO 
92/09690; PCT Publication No. WO 90/02809; Fuchs et al . 
(1991) Bio/Technology 9:1370-1372; Hay et al . (1992) Hum. 
Antibod. Hybridomas 3:81-85; Huse et al . (1989) Science 
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246:1275-1281; Griffiths et al . (1993) EMBO J. 12:725- 
734. 

Additionally, recombinant antibodies, such as chimeric 
and humanized monoclonal antibodies, comprising both 
5 human and non-human portions, which can be made using 

standard recombinant DNA techniques, are within the scope 
of the invention. Such chimeric and humanized monoclonal 
antibodies can be produced by recombinant DNA techniques 
known in the art, for example using methods described in 

10 PCT Publication No. WO 87/02671; European Patent 

Application 184,187; European Patent Application 171,496; 
European Patent Application 173,494; PCT Publication No. 
WO 86/01533; U.S. Patent No. 4,816,567; European Patent 
Application 125,023; Better et al . (1988) Science 

15 240:1041-1043; Liu et al . (1987) Proc . Natl. Acad. Sci. 
USA 84:3439-3443; Liu et al . (1987) J. Immunol. 
139:3521-3526; Sun et al . (1987) Proc. Natl. Acad. Sci. 
USA 84:214-218; Nishimura et al . (1987) Cane. Res. 
47:999-1005; Wood et al . (1985) Nature 314:446-449; and 

20 Shaw et al . (1988) J. Natl. Cancer Inst. 80:1553-1559); 
Morrison (1985) Science 229:1202-1207; Oi et al . (1986) 
Bio/Techniques 4:214; U.S. Patent 5,225,539; Jones et al . 
(1986) Nature 321:552-525; Verhoeyan et al . (1988) 
Science 239:1534; and Beidler et al . (1988) J". Immunol. 

25 141:4053-4060. 

Completely human antibodies are particularly desirable 
for therapeutic treatment of human patients. Such 
antibodies can be produced using transgenic mice which 
are incapable of expressing endogenous immunoglobulin 

30 heavy and light chains genes, but which can express human 
heavy and light chain genes. The transgenic mice are 
immunized in the normal fashion with a selected antigen, 
e.g., all or a portion of a polypeptide of the invention. 
Monoclonal antibodies directed against the antigen can be 

35 obtained using conventional hybridoma technology. The 
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human immunoglobulin transgenes harbored by the 
transgenic mice rearrange during B cell differentiation, 
and subsequently undergo class switching and somatic 
mutation. Thus, using such a technique, it is possible 
5 to produce therapeutically useful IgG, IgA and IgE 
antibodies. For an overview of this technology for 
producing human antibodies, see Lonberg and Huszar (1995, 
Jut. -Rev. Immunol . 13:65-93). For a detailed discussion 
of this technology for producing human antibodies and 

10 human monoclonal antibodies and protocols for producing 
such antibodies, see, e.g., U.S. Patent 5,625,126; U.S. 
Patent 5,633,425; U.S. Patent 5,569,825; U.S. Patent 
5,661,016; and U.S. Patent 5,545,806. In addition, 
companies such as Abgenix, Inc. (Freemont, CA) , can be 

15 engaged to provide human antibodies directed against a 
selected antigen using technology similar to that 
described above. 

Completely human antibodies which recognize a selected 
epitope can be generated using a technique referred to as 

20 "guided selection." In this approach a selected 

non-human monoclonal antibody, e.g., a murine antibody, 
is used to guide the selection of a completely human 
antibody recognizing the same epitope. 

An antibody directed against a polypeptide of the 

25 invention (e.g., monoclonal antibody) can be used to 
isolate the polypeptide by standard techniques, such as 
affinity chromatography or immunoprecipitation. 
Moreover, such an antibody can be used to detect the 
protein (e.g., in a cellular lysate or cell supernatant) 

30 in order to evaluate the abundance and pattern of 

expression of the polypeptide. The antibodies can also 
be used diagnostically to monitor protein levels in 
tissue as part of a clinical testing procedure, e.g., to, 
for example, determine the efficacy of a given treatment 

35 regimen. Detection can be facilitated by coupling the 
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antibody to a detectable substance. Examples of 
detectable substances include various enzymes, prosthetic 
groups, fluorescent materials, luminescent materials, 
bioluminescent materials, and radioactive materials. 
5 Examples of suitable enzymes include horseradish 

peroxidase, alkaline phosphatase, j3-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic 
group complexes include streptavidin/biotin and 
avidin/biotin; examples of suitable fluorescent materials 

10 include umbellif erone, fluorescein, fluorescein 
isothiocyanate, rhodamine, dichlorotriazinylamine 
fluorescein, dansyl chloride or phycoerythrin; an example 
of a luminescent material includes luminol ; examples of 
bioluminescent materials include lucif erase, luciferin, 

15 and aequorin, and examples of suitable radioactive 
material include 125 I, 131 I, 35 S or 3 H. 

III. Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, 
preferably expression vectors, containing a nucleic acid 

20 encoding a polypeptide of the invention (or a portion 
thereof) . As used herein, the term "vector" refers to a 
nucleic acid molecule capable of transporting another 
nucleic acid to which it has been linked. One type of 
vector is a "plasmid", which refers to a circular double 

25 stranded DNA loop into which additional DNA segments can 
be ligated. Another type of vector is a viral vector, 
wherein additional DNA segments can be ligated into the 
viral genome. Certain vectors are capable of autonomous 
replication in a host cell into which they are introduced 

30 (e.g., bacterial vectors having a bacterial origin of 
replication and episomal mammalian vectors) . Other 
vectors (e.g., non-episomal mammalian vectors) are 
integrated into the genome of a host cell upon 
introduction into the host cell, and thereby are 
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replicated along with the host genome. Moreover, certain 
vectors, expression vectors, are capable of directing the 
expression of genes to which they are operably linked. 
In general, expression vectors of utility in recombinant 
5 DNA techniques are often in the form of plasmids 

(vectors) . However, the invention is intended to include 
such other forms of expression vectors, such as viral 
vectors (e.g., replication defective retroviruses, 
adenoviruses and adeno-associated viruses) , which serve 

10 equivalent functions. 

The recombinant expression vectors of the invention 
comprise a nucleic acid of the invention in a form 
suitable for expression of the nucleic acid in a host 
cell. This means that the recombinant expression vectors 

15 include one or more regulatory sequences, selected on the 
basis of the host cells to be used for expression, which 
is operably linked to the nucleic acid sequence to be 
expressed. Within a recombinant expression vector, 
"operably linked" is intended to mean that the nucleotide 

20 sequence of interest is linked to the regulatory 

sequence (s) in a manner which allows for expression of 
the nucleotide sequence (e.g., in an in vitro 
transcription/translation system or in a host cell when 
the vector is introduced into the host cell) . The term 

25 "regulatory sequence" is intended to include promoters, 
enhancers and other expression control elements (e.g., 
polyadenylation signals) . Such regulatory sequences are 
described, for example, in Goeddel, Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, 

30 San Diego, CA (1990) . Regulatory sequences include those 
which direct constitutive expression of a nucleotide 
sequence in many types of host cell and those which 
direct expression of the nucleotide sequence only in 
certain host cells (e.g., tissue-specific regulatory 

35 sequences) . It will be appreciated by those skilled in 
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the art that the design of the expression vector can 
depend on such factors as the choice of the host cell to 
be transformed, the level of expression of protein 
desired, etc. The expression vectors of the invention 
5 can be introduced into host cells to thereby produce 
proteins or peptides, including fusion proteins or 
peptides, encoded by nucleic acids as described herein. 

The recombinant expression vectors of the invention can 
be designed for expression of a polypeptide of the 

10 invention in prokaryotic or eukaryotic cells, e.g., 
bacterial cells such as E. coli, insect cells (using 
baculovirus expression vectors) , yeast cells or mammalian 
cells. Suitable host cells are discussed further in 
Goeddel, supra. Alternatively, the recombinant 

15 expression vector can be transcribed and translated in 
vitro, for example using T7 promoter regulatory sequences 
and T7 polymerase. 

Expression of proteins in prokaryotes is most often 
carried out in E. coli with vectors containing 

20 constitutive or inducible promoters directing the 
expression of either fusion or non-fusion proteins. 
Fusion vectors add a number of amino acids to a protein 
encoded therein, usually to the amino terminus of the 
recombinant protein. Such fusion vectors typically serve 

25 three purposes: 1) to increase expression of recombinant 
protein; 2) to increase the solubility of the recombinant 
protein; and 3) to aid in the purification of the 
recombinant protein by acting as a ligand in affinity 
purification. Often, in fusion expression vectors, a 

30 proteolytic cleavage site is introduced at the junction 
of the fusion moiety and the recombinant protein to 
enable separation of the recombinant protein from the 
fusion moiety subsequent to purification of the fusion 
protein. Such enzymes, and their cognate recognition 

35 sequences, include Factor Xa, thrombin and enterokinase . 
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Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson (1988) Gene 67:31-40), 
pMAL (New England Biolabs, Beverly, MA) and pRIT5 
(Pharmacia, Piscataway, NJ) which fuse glutathione S- 
5 transferase (GST) , maltose E binding protein, or protein 
A, respectively, to the target recombinant protein. 

Examples of suitable inducible non-fusion E. coli 
expression vectors include pTrc (Amann et al., (1988) 
Gene 69:301-315) and pET lid (Studier et al . , Gene 

10 Expression Technology: Methods in Enzymology 185, 
Academic Press, San Diego, California (1990) 60-89) . 
Target gene expression from the pTrc vector relies on 
host RNA polymerase transcription from a hybrid trp-lac 
fusion promoter. Target gene expression from the pET lid 

15 vector relies on transcription from a T7 gnlO-lac fusion 
promoter mediated by a coexpressed viral RNA polymerase 
(T7 gnl) . This viral polymerase is supplied by host 
strains BL21 (DE3) or HMS174(DE3) from a resident X 
prophage harboring a T7 gnl gene under the 

2 0 transcriptional control of the lacUV 5 promoter. 

One strategy to maximize recombinant protein expression 
in E. coli is to express the protein in a host bacteria 
with an impaired capacity to proteolytically cleave the 
recombinant protein (Gottesman, Gene Expression 

25 Technology: Methods in Enzymology 185, Academic Press, 
San Diego, California (1990) 119-128) . Another strategy 
is to alter the nucleic acid sequence of the nucleic acid 
to be inserted into an expression vector so that the 
individual codons for each amino acid are those 

30 preferentially utilized in E. coli (Wada et al . (1992) 
Nucleic Acids Res. 20:2111-2118) . Such alteration of 
nucleic acid sequences of the invention can be carried 
out by standard DNA synthesis techniques. 

In another embodiment, the expression vector is a yeast 

35 expression vector. Examples of vectors for expression in 
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yeast S. cerivisae include pYepSecl (Baldari et al . 
(1987) EMBO J. 6:229-234), pMFa (Kurjan and Herskowitz, 
(1982) Cell 30:933-943), pJRY88 (Schultz et al . (1987) 
Gene 54:113-123), pYES2 (Invitrogen Corporation, San 
5 Diego, CA) , and pPicZ (Invitrogen Corp, San Diego, CA) . 
Alternatively, the expression vector is a baculovirus 
expression vector. Baculovirus vectors available for 
expression of proteins in cultured insect cells (e.g., Sf 
9 cells) include the pAc series (Smith et al . (1983) Mol. 

10 Cell Biol. 3:2156-2165) and the pVL series (Lucklow and 
Summers (1989) Virology 170 : 31-39) . 

In yet another embodiment, a nucleic acid of the 
invention is expressed in mammalian cells using a 
mammalian expression vector. Examples of mammalian 

15 expression vectors include pCDM8 (Seed (1987) Mature 

329:840) and pMT2PC (Kaufman et al . (1987) EMBO J. 6:187- 
195) . When used in mammalian cells, the expression 
vector's control functions are often provided by viral 
regulatory elements. For example, commonly used 

20 promoters are derived from polyoma, Adenovirus 2, 

cytomegalovirus and Simian Virus 40. For other suitable 
expression systems for both prokaryotic and eukaryotic 
cells see chapters 16 and 17 of Sambrook et al . , supra. 
In another embodiment, the recombinant mammalian 

25 expression vector is capable of directing expression of 
the nucleic acid preferentially in a particular cell type 
(e.g., tissue-specific regulatory elements are used to 
express the nucleic acid) . Tissue-specific regulatory 
elements are known in the art. Non-limiting examples of 

30 suitable tissue-specific promoters include the albumin 
promoter (liver-specific; Pinkert et al . (1987) Genes 
Dev. 1:268-277) , lymphoid- specif ic promoters (Calame and 
Eaton (1988) Adv. Immunol. 43:235-275), in particular 
promoters of T cell receptors (Winoto and Baltimore 

35 (1989) EMBO J. 8:729-733) and immunoglobulins (Banerji et 
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al. (1983) Cell 33:729-740; Queen and Baltimore (1983) 
Cell 33:741-748), neuron- specific promoters (e.g., the 
neurofilament promoter; Byrne and Ruddle (1989) Proc. 
Natl. Acad. Sci. USA 86:5473-5477) , pancreas-specific 
5 promoters (Edlund et al . (1985) Science 230:912-916), and 
mammary gland-specific promoters (e.g., milk whey 
promoter; U.S. Patent No. 4,873,316 and European 
Application Publication No. 264,166). Development ally - 
regulated promoters are also encompassed, for example the 

10 murine hox promoters (Kessel and Gruss (1990) Science 
249:374-379) and the a-f etoprotein promoter (Campes and 
Tilghman (1989) Genes Dev. 3:537-546). 

The invention further provides a recombinant expression 
vector comprising a DNA molecule of the invention cloned 

15 into the expression vector in an antisense orientation. 
That is, the DNA molecule is operably linked to a v 
regulatory sequence in a manner which allows for 
expression (by transcription of the DNA molecule) of an 
RNA molecule which is antisense to the mRNA encoding a 

2 0 polypeptide of the invention. Regulatory sequences 

operably linked to a nucleic acid cloned in the antisense 
orientation can be chosen which direct the continuous 
expression of the antisense RNA molecule in a variety of 
cell types, for instance viral promoters and/or 
25 enhancers, or regulatory sequences can be chosen which 
direct constitutive, tissue specific or cell type 
specific expression of antisense RNA. The antisense 
expression vector can be in the form of a recombinant 
plasmid, phagemid or attenuated virus in which antisense 

3 0 nucleic acids are produced under the control of a high 

efficiency regulatory region, the activity of which can 
be determined by the cell type into which the vector is 
introduced. For a discussion of the regulation of gene 
expression using antisense genes see Weintraub et al . 
35 (Reviews - Trends in Genetics, Vol. 1(1) 1986). 
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Another aspect of the invention pertains to host cells 
into which a recombinant expression vector of the 
invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. 
5 It is understood that such terms refer not only to the 
particular subject cell but to the progeny or potential 
progeny of such a cell. Because certain modifications 
may occur in succeeding generations due to either 
mutation or environmental influences, such progeny may 
10 not, in fact, be identical to the parent cell, but are 
still included within the scope of the term as used 
herein. 

A host cell can be any prokaryotic (e.g., E. coli) or 
eukaryotic (e.g., an insect cell, a yeast cell or a 

15 mammalian cell) cell. 

Vector DNA can be introduced into prokaryotic or 
eukaryotic cells via conventional transformation or 
transfection techniques. As used herein, the terms 
"transformation" and "transfection" are intended to refer 

20 to a variety of art-recognized techniques for introducing 
foreign nucleic acid into a host cell, including calcium 
phosphate or calcium chloride co-precipitation, DEAE- 
dextran-mediated transfection, lipofection, or 
elect roporat ion. Suitable methods for transforming or 

25 transfecting host cells can be found in Sambrook, et al . 
(supra), and other laboratory manuals. 

For stable transfection of mammalian cells, it is known 
that, depending upon the expression vector and 
transfection technique used, only a small fraction of 

30 cells may integrate the foreign DNA into their genome. 
In order to identify and select these integrants, a gene 
that encodes a selectable marker (e.g., for resistance to 
antibiotics) is generally introduced into the host cells 
along with the gene of interest. Preferred selectable 

35 markers include those which confer resistance to drugs, 



WO 00/18904 



PCI7US99/22817 



- 83 - 

such as G418, hygromycin and methotrexate. Cells stably 
transfected with the introduced nucleic acid can be 
identified by drug selection (e.g., cells that have 
incorporated the selectable marker gene will survive, 
5 while the other cells die) . 

A host cell of the invention, such as a prokaryotic or 
eukaryotic host cell in culture, can be used to produce a 
polypeptide of the invention. Accordingly, the invention 
further provides methods for producing a polypeptide of 

10 the invention using the host cells of the invention. In 
one embodiment, the method comprises culturing the host 
cell of invention (into which a recombinant expression 
vector encoding a polypeptide of the invention has been 
introduced) in a suitable medium such that the 

15 polypeptide is produced. In another embodiment, the 

method further comprises isolating the polypeptide from 
the medium or the host cell. 

The host cells of the invention can also be used to 
produce nonhuman transgenic animals. For example, in one 

20 embodiment, a host cell of the invention is a fertilized 
oocyte or an embryonic stem cell into which a sequences 
encoding a polypeptide of the invention have been 
introduced. Such host cells can then be used to create 
non-human transgenic animals in which exogenous sequences 

25 encoding a polypeptide of the invention have been 

introduced into their genome or homologous recombinant 
animals in which endogenous encoding a polypeptide of the 
invention sequences have been altered. Such animals are 
useful for studying the function and/or activity of the 

30 polypeptide and for identifying and/or evaluating 

modulators of polypeptide activity. As used herein, a 
"transgenic animal" is a non-human animal, preferably a 
mammal, more preferably a rodent such as a rat or mouse, 
in which one or more of the cells of the animal includes 

35 a transgene. Other examples of transgenic animals 
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include non-human primates, sheep, dogs, cows, goats, 
chickens, amphibians, etc. A transgene is exogenous DNA 
which is integrated into the genome of a cell from which 
a transgenic animal develops and which remains in the 
5 genome of the mature animal, thereby directing the 

expression of an encoded gene product in one or more cell 
types or tissues of the transgenic animal . As used 
herein, an "homologous recombinant animal" is a non-human 
animal, preferably a mammal, more preferably a mouse, in 

10 which an endogenous gene has been altered by homologous 
recombination between the endogenous gene and an 
exogenous DNA molecule introduced into a cell of the 
animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 

15 A transgenic animal of the invention can be created by 
introducing nucleic acid encoding a polypeptide of the 
invention (or a homologue thereof) into the male 
pronuclei of a fertilized oocyte, e.g., by 
microinjection, retroviral infection, and allowing the 

20 oocyte to develop in a pseudopregnant female foster 

animal. Intronic sequences and polyadenylation signals 
can also be included in the transgene to increase the 
efficiency of expression of the transgene. A tissue- 
specific regulatory sequence (s) can be operably linked to 

25 the transgene to direct expression of the polypeptide of 
the invention to particular cells. Methods for 
generating transgenic animals via embryo manipulation and 
microinjection, particularly animals such as mice, have 
become conventional in the art and are described, for 

30 example, in U.S. Patent NOS. 4,736,866 and 4,870,009, 
U.S. Patent No. 4,873,191 and in Hogan, Manipulating the 
Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, N.Y., 1986). Similar methods are used for 
production of other transgenic animals. A transgenic 

35 founder animal can be identified based upon the presence 
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of the transgene in its genome and/or expression of mRNA 
encoding the transgene in tissues or cells of the 
animals. A transgenic founder animal can then be used to 
breed additional animals carrying the transgene. 
5 Moreover, transgenic animals carrying the transgene can 
further be bred to other transgenic animals carrying 
other transgenes. 

To create an homologous recombinant animal, a vector is 
prepared which contains at least a portion of a gene 

10 encoding a polypeptide of the invention into which a 

deletion, addition or substitution has been introduced to 
thereby alter, e.g., functionally disrupt, the gene. In 
a preferred embodiment, the vector is designed such that, 
upon homologous recombination, the endogenous gene is 

15 functionally disrupted (i.e., no longer encodes a 

functional protein; also referred to as a "knock out" 
vector) . Alternatively, the vector can be designed such 
that, upon homologous recombination, the endogenous gene 
is mutated or otherwise altered but still encodes 

20 functional protein (e.g., the upstream regulatory region 
can be altered to thereby alter the expression of the 
endogenous protein) . In the homologous recombination 
vector, the altered portion of the gene is flanked at its 
5' and 3' ends by additional nucleic acid of the gene to 

25 allow for homologous recombination to occur between the 
exogenous gene carried by the vector and an endogenous 
gene in an embryonic stem cell. The additional flanking 
nucleic acid sequences are of sufficient length for 
successful homologous recombination with the endogenous 

30 gene. Typically, several kilobases of flanking DNA (both 
at the 5' and 3' ends) are included in the vector (see, 
e.g., Thomas and Capecchi (1987) Cell 51:503 for a 
description of homologous recombination vectors) . The 
vector is introduced into an embryonic stem cell line 

35 (e.g., by electroporation) and cells in which the 
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introduced gene has homologously recombined with the 
endogenous gene are selected (see, e.g., Li et al . (1992) 
Cell 69:915). The selected cells are then injected into 
a blastocyst of an animal (e.g., a mouse) to form 
5 aggregation chimeras (see, e.g., Bradley in 

Teratocarcinomas and Embryonic Stem Cells: A Practical 
Approach, Robertson, ed. (IRL, Oxford, 1987) pp. 113- 
152) . A chimeric embryo can then be implanted into a 
suitable pseudopregnant female foster animal and the 

10 embryo brought to term. Progeny harboring the 

homologously recombined DNA in their germ cells can be 
used to breed animals in which all cells of the animal 
contain the homologously recombined DNA by germline 
transmission of the transgene . Methods for constructing 

15 homologous recombination vectors and homologous 

recombinant animals are described further in Bradley 
(1991) Current Opinion in Bio/Technology 2:823-829 and in 
PCT Publication NOS. WO 90/11354, WO 91/01140, WO 
92/0968, and WO 93/04169. 

20 In another embodiment, transgenic non-human animals can 
be produced which contain selected systems which allow 
for regulated expression of the transgene. One example 
of such a system is the cre/loxP recombinase system of 
bacteriophage PI. For a description of the cre/loxP 

25 recombinase system, see, e.g., Lakso et al. (1992) Proc. 
Natl. Acad. Sci. USA 89:6232-6236. Another example of a 
recombinase system is the FLP recombinase system of 
Saccharomyces cerevisiae (O'Gorman et al . (1991) Science 
251:1351-1355. If a cre/loxP recombinase system is used 

3 0 to regulate expression of the transgene, animals 

containing transgenes encoding both the Cre recombinase 
and a selected protein are required. Such animals can be 
provided through the construction of "double" transgenic 
animals, e.g., by mating two transgenic animals, one 
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containing a transgene encoding a selected protein and 
the other containing a transgene encoding a recombinase . 

Clones of the non-human transgenic animals described 
herein can also be produced according to the methods 
5 described in Wilmut et al . (1997) Nature 385:810-813 and 
PCT Publication NOS. WO 97/07668 and WO 97/07669. 

IV. Pharmaceutical Compositions 

The nucleic acid molecules, polypeptides, and 
antibodies (also referred to herein as "active 

10 compounds") of the invention can be incorporated into 
pharmaceutical compositions suitable for administration. 
Such compositions typically comprise the nucleic acid 
molecule, protein, or antibody and a pharmaceutical ly 
acceptable carrier. As used herein the language 

15 "pharmaceutically acceptable carrier" is intended to 

include any and all solvents, dispersion media, coatings, 
antibacterial and antifungal agents, isotonic and 
■ absorption delaying agents, and the like, compatible with 
pharmaceutical administration. The use of such media and 

20 agents for pharmaceutically active substances is well 
known in the art. Except insofar as any conventional 
media or agent is incompatible with the active compound, 
use thereof in the compositions is contemplated. 
Supplementary active compounds can also be incorporated 

25 into the compositions. 

The invention includes methods for preparing 
pharmaceutical compositions for modulating the expression 
or activity of a polypeptide or nucleic acid of the 
invention. Such methods comprise formulating a 

30 pharmaceutically acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention. Such compositions can 
further include additionl active agents. Thus, the 
invention further includes methods for preparing a 
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pharmaceutical composition by formulating a 
pharmaceutical^ acceptable carrier with an agent which 
modulates expression or activity of a polypeptide or 
nucleic acid of the invention and one or more addtional 
5 active compounds, 

A pharmaceutical composition of the invention is 
formulated to be compatible with its intended route of 
administration. Examples of routes of administration 
include parenteral , e.g., intravenous , intradermal , 

10 subcutaneous, oral (e.g., inhalation), transdermal 
(topical), transmucosal, and rectal administration. 
Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the 
following components: a sterile diluent such as water for 

15 injection, saline solution, fixed oils, polyethylene 
glycols, glycerine, propylene glycol or other synthetic 
solvents; antibacterial agents such as benzyl alcohol or 
methyl parabens; antioxidants such as ascorbic acid or 
sodium bisulfite; chelating agents such as 

20 ethylenediaminetetraacetic acid; buffers such as 

acetates, citrates or phosphates and agents for the 
adjustment of tonicity such as sodium chloride or 
dextrose. pH can be adjusted with acids or bases, such 
as hydrochloric acid or sodium hydroxide. The parenteral 

25 preparation can be enclosed in ampoules, disposable 

syringes or multiple dose vials made of glass or plastic. 

Pharmaceutical compositions suitable for injectable use 
include sterile aqueous solutions (where water soluble) 
or dispersions and sterile powders for the extemporaneous 
30 preparation of sterile injectable solutions or 

dispersions. For intravenous administration, suitable 
carriers include physiological saline, bacteriostatic 
water, Cremophor EL™ (BASF; Parsippany, NJ) or phosphate 
buffered saline (PBS) . In all cases, the composition 
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must be sterile and should be fluid to the extent that 
easy syringability exists. It must be stable under the 
conditions of manufacture and storage and must be 
preserved against the contaminating action of 
5 microorganisms such as bacteria and fungi. The carrier 
can be a solvent or dispersion medium containing, for 
example, water, ethanol, polyol (for example, glycerol, 
propylene glycol, and liquid polyetheylene glycol, and 
the like), and suitable mixtures thereof. The proper 

10 fluidity can be maintained, for example, by the use of a 
coating such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by 
the use of surfactants. Prevention of the action of 
microorganisms can be achieved by various antibacterial 

15 and antifungal agents, for example, parabens, 

chlorobutanol, phenol, ascorbic acid, thimerosal, and the 
like. In many cases, it will be preferable to include 
isotonic agents, for example, sugars, polyalcohols such 
as mannitol, sorbitol, sodium chloride in the 

20 composition. Prolonged absorption of the injectable 
compositions can be brought about by including in the 
composition an agent which delays absorption, for 
example, aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by 

25 incorporating the active compound (e.g., a polypeptide or 
antibody) in the required amount in an appropriate 
solvent with one or a combination of ingredients 
enumerated above, as required, followed by filtered 
sterilization. Generally, dispersions are prepared by 

3 0 incorporating the active compound into a sterile vehicle 
which contains a basic dispersion medium and the required 
other ingredients from those enumerated above. In the 
case of sterile powders for the preparation of sterile 
injectable solutions, the preferred methods of 

35 preparation are vacuum drying and f reeze-drying which 
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yields a powder of the active ingredient plus any 
additional desired ingredient from a previously sterile- 
filtered solution thereof. 

Oral compositions generally include an inert diluent or 
5 an edible carrier. They can be enclosed in gelatin 

capsules or compressed into tablets. For the purpose of 
oral therapeutic administration, the active compound can 
be incorporated with excipients and used in the form of 
tablets, troches, or capsules. Oral compositions can 

10 also be prepared using a fluid carrier for use as a 

mouthwash, wherein the compound in the fluid carrier is 
applied orally and swished and expectorated or swallowed. 
Pharmaceutically compatible binding agents, and/or 
adjuvant materials can be included as part of the 

15 composition. The tablets, pills, capsules, troches and 
the like can contain any of the following ingredients, or 
compounds of a similar nature: a binder such as 
microcrystalline cellulose, gum tragacanth or gelatin; an 
excipient such as starch or lactose, a disintegrating 

20 agent such as alginic acid, Primogel, or corn starch; a 
lubricant such as magnesium stearate or Sterotes; a 
glidant such as colloidal silicon dioxide; a sweetening 
agent such as sucrose or saccharin; or a flavoring agent 
such as peppermint, methyl salicylate, or orange 

25 flavoring. 

For administration by inhalation, the compounds are 
delivered in the form of an aerosol spray from a 
pressurized container or dispenser which contains a 
suitable propellant, e.g., a gas such as carbon dioxide, 

30 or a nebulizer. 

Systemic administration can also be by transmucosal or 
transdermal means. For transmucosal or transdermal 
administration, penetrants appropriate to the barrier to 
be permeated are used in the formulation. Such 

35 penetrants are generally known in the art, and include, 
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for example, for transmucosal administration, detergents, 
bile salts, and fusidic acid derivatives. Transmucosal 
administration can be accomplished through the use of 
nasal sprays or suppositories. For transdermal 
5 administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in 
the art. 

The compounds can also be prepared in the form of 
suppositories (e.g., with conventional suppository bases 

10 such as cocoa butter and other glycerides) or retention 
enemas for rectal delivery. 

In one embodiment, the active compounds are prepared 
with carriers that will protect the compound against 
rapid elimination from the body, such as a controlled 

15 release formulation, including implants and 

microencapsulated delivery systems. Biodegradable, 
biocompatible polymers can be used, such as ethylene 
vinyl acetate, polyanhydrides, polyglycolic acid, 
collagen, polyorthoesters , and polylactic acid. Methods 

20 for preparation of such formulations will be apparent to 
those skilled in the art. The materials can also be 
obtained commercially from Alza Corporation and Nova 
Pharmaceuticals, Inc. Liposomal suspensions (including 
liposomes targeted to infected cells with monoclonal 

25 antibodies to viral antigens) can also be used as 
pharmaceutical ly acceptable carriers. These can be 
prepared according to methods known to those skilled in 
the art, for example, as described in U.S. Patent No. 
4,522,811. 

30 It is especially advantageous to formulate oral or 
parenteral compositions in dosage unit form for ease of 
administration and uniformity of dosage. Dosage unit 
form as used herein refers to physically discrete units 
suited as unitary dosages for the subject to be treated; 

35 each unit containing a predetermined quantity of active 
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compound calculated to produce the desired therapeutic 
effect in association with the required pharmaceutical 
carrier. The specification for the dosage unit forms of 
the invention are dictated by and directly dependent on 
5 the unique characteristics of the active compound and the 
particular therapeutic effect to be achieved, and the 
limitations inherent in the art of compounding such an 
active compound for the treatment of individuals. 

For antibodies, the preferred dosage is 0.1 mg/kg to 

10 100 mg/kg of body weight (generally 10 mg/kg to 20 

mg/kg) . If the antibody is to act in the brain, a dosage 
of 50 mg/kg to 100 mg/kg is usually appropriate. 
Generally, partially human antibodies and fully human 
antibodies have a longer half-life within the human body 

15 than other antibodies. Accordingly, lower dosages and 
less frequent administration is often possible. 
Modifications such as lipidation can be used to stabilize 
antibodies and to enhance uptake and tissue penetration 
(e.g., into the brain). A method for lipidation of 

20 antibodies is described by Cruikshank et al . ((1997) J". 
Acquired Immune Deficiency Syndromes and Human 
Retrovirology 14:193) . 

The nucleic acid molecules of the invention can be 
inserted into vectors and used as gene therapy vectors . 

25 Gene therapy vectors can be delivered to a subject by, 
for example, intravenous injection, local administration 
(U.S. Patent 5,328,470) or by stereotactic injection 
(see, e.g., Chen et al . (1994) Proc. Natl. Acad. Sci. USA 
91:3054-3057). The pharmaceutical preparation of the 

30 gene therapy vector can include the gene therapy vector 
in an acceptable diluent, or can comprise a slow release 
matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector 
can be produced intact from recombinant cells, e.g. 

3 5 retroviral vectors, the pharmaceutical preparation can 
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include one or more cells which produce the gene delivery 
system. 

The pharmaceutical compositions can be included in a 
container, pack, or dispenser together with instructions 
5 for administration. 

V. Uses and Methods of the Invention 

The nucleic acid molecules, proteins, protein 
homologues, and antibodies described herein can be used 
in one or more of the following methods: a) screening 

10 assays; b) detection assays (e.g., chromosomal mapping, 
tissue typing, forensic biology) ; c) predictive medicine 
(e.g., diagnostic assays, prognostic assays, monitoring 
clinical trials, and pharmacogenomics) ; and d) methods of 
treatment (e.g., therapeutic and prophylactic). For 

15 example, polypeptides of the invention can to used to (i) 
modulate cellular proliferation; (ii) modulate cellular 
differentiation; and (iii) modulate cell survival. The 
isolated nucleic acid molecules of the invention can be 
used to express proteins (e.g., via a recombinant 

20 expression vector in a host cell in gene therapy 

applications), to detect mRNA (e.g., in a biological 
sample) or a genetic lesion, and to modulate activity of 
a polypeptide of the invention. In addition, the 
polypeptides of the invention can be used to screen drugs 

25 or compounds which modulate activity or expression of a 
polypeptide of the invention as well as to treat 
disorders characterized by insufficient or excessive 
production of a protein of the invention or production of 
a form of a protein of the invention which has decreased 

30 or aberrant activity compared to the wild type protein. 
In addition, the antibodies of the invention can be used 
to detect and isolate a protein of the invention and 
modulate activity of a protein of the invention. 
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This invention further pertains to novel agents 
identified by the above-described screening assays and 
uses thereof for treatments as described herein. 



A. Screening Assays 
5 The invention provides a method (also referred to 
herein as a "screening assay") for identifying 
modulators, i.e., candidate or test compounds or agents 
(e.g., peptides, peptidomimetics, small molecules or 
other drugs) which bind to polypeptide of the invention 
10 or have a stimulatory or inhibitory effect on, for 

example, expression or activity of a polypeptide of the 
invention. 

In one embodiment, the invention provides assays for 
screening candidate or test compounds which bind to or 

15 modulate the activity of the membrane -bound form of a 
polypeptide of the invention or biologically active 
portion thereof. The test compounds of the present 
invention can be obtained using any of the numerous 
approaches in combinatorial library methods known in the 

20 art, including: biological libraries; spatially 
addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring 
deconvolution; the "one-bead one -compound" library 
method; and synthetic library methods using affinity 

25 chromatography selection. The biological library 

approach is limited to peptide libraries, while the other 
four approaches are applicable to peptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam 
(1997) Anticancer Drug Des. 12:145) . 

30 Examples of methods for the synthesis of molecular 
libraries can be found in the art, for example in: 
DeWitt et al . (1993) Proc. Natl. Acad. Sci. USA 90:6909; 
Erb et al . (1994) Proc. Natl. Acad. Sci. USA 91:11422; 
Zuckermann et al . (1994). J. Med. Chem. 37:2678; Cho et 
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al. (1993) Science 261:1303; Carrell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2059; Carell et al . (1994) Angew. 
Chem. Int. Ed. Engl. 33:2061; and Gallop et al . (1994) J". 
Med. Chem. 37:1233. 
5 Libraries of compounds may be presented in solution 
(e.g., Houghten (1992) Bio/Techniques 13:412-421), or on 
beads (Lam (1991) Nature 354 : 82-84) , chips (Fodor (1993) 
Nature 364:555-556), bacteria (U.S. Patent No. 
5,223,409), spores (Patent NOS. 5,571,698; 5,403,484; and 

10 5,223,409), plasmids (Cull et al . (1992) Proc. Natl. 
Acad. Sci. USA 89:1865-1869) or phage (Scott and Smith 
(1990) Science 249:386-390; Devlin (1990) Science 
249:404-406; Cwirla et al . (1990) Proc. Natl. Acad. Sci. 
USA 87:6378-6382; and Felici (1991) J. Mol . Biol. 

15 222 :301-310) . 

In one embodiment, an assay is a cell -based assay in 
which a cell which expresses a membrane -bound form of a 
polypeptide of the invention, or a biologically active 
portion thereof, on the cell surface is contacted with a 

20 test compound and the ability of the test compound to 
bind to the polypeptide determined. The cell, for 
example, can be a yeast cell or a cell of mammalian 
origin. Determining the ability of the test compound to 
bind to the polypeptide can be accomplished, for example, 

25 by coupling the test compound with a radioisotope or 

enzymatic label such that binding of the test compound to 
the polypeptide or biologically active portion thereof 
can be determined by detecting the labeled compound in a 
complex. For example, test compounds can be labeled with 

30 125 I, 35 S, 14 C, or 3 H, either directly or indirectly, and 
the radioisotope detected by direct counting of 
radioemmission or by scintillation counting. 
Alternatively, test compounds can be enzymatically 
labeled with, for example, horseradish peroxidase, 

35 alkaline phosphatase, or lucif erase, and the enzymatic 
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label detected by determination of conversion of an 
appropriate substrate to product. In a preferred 
embodiment, the assay comprises contacting a cell which 
expresses a membrane -bound form of a polypeptide of the 
5 invention, or a biologically active portion thereof, on 
the cell surface with a known compound which binds the 
polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test compound to interact with the 
10 polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide comprises 
determining the ability of the test compound to 
preferentially bind to the polypeptide or a biologically 
active portion thereof as compared to the known compound. 

15 In another embodiment, an assay is a cell -based assay 
comprising contacting a cell expressing a membrane -bound 
form of a polypeptide of the invention, or a biologically 
active portion thereof, on the cell surface with a test 
compound and determining the ability of the test compound 

20 to modulate (e.g., stimulate or inhibit) the activity of 
the polypeptide or biologically active portion thereof. 
Determining the ability of the test compound to modulate 
the activity of the polypeptide or a biologically active 
portion thereof can be accomplished, for example, by 

25 determining the ability of the polypeptide protein to 
bind to or interact with a target molecule. 

Determining the ability of a polypeptide of the 
invention to bind to or interact with a target molecule 
can be accomplished by one of the methods described above 

30 for determining direct binding. As used herein, a 

"target molecule" is a molecule with which a selected 
polypeptide (e.g., a polypeptide of the invention binds 
or interacts with in nature, for example, a molecule on 
the surface of a cell which expresses the selected 
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protein, a molecule on the surface of a second cell, a 
molecule in the extracellular milieu, a molecule 
associated with the internal surface of a cell membrane 
or a cytoplasmic molecule. A target molecule can be a 
5 polypeptide of the invention or some other polypeptide or 
protein. For example, a target molecule can be a 
component of a signal transduction pathway which 
facilitates transduction of an extracellular signal 
(e.g., a signal generated by binding of a compound to a 

10 polypeptide of the invention) through the cell membrane 
and into the cell or a second intercellular protein which 
has catalytic activity or a protein which facilitates the 
association of downstream signaling molecules with a 
polypeptide of the invention. Determining the ability of 

15 a polypeptide of the invention to bind to or interact 

with a target molecule can be accomplished by determining 
the activity of the target molecule. For example, the 
activity of the target molecule can be determined by 
detecting induction of a cellular second messenger of the 

20 target (e.g., intracellular Ca 2+ , diacylglycerol , IP3, 
etc.), detecting catalytic/enzymatic activity of the 
target on an appropriate substrate, detecting the 
induction of a reporter gene (e.g., a regulatory element 
that is responsive to a polypeptide of the invention 

25 operably linked to a nucleic acid encoding a detectable 
marker, e.g. luciferase) , or detecting a cellular 
response, for example, cellular differentiation, or cell 
proliferation. 

In yet another embodiment, an assay of the present 

3 0 invention is a cell -free assay comprising contacting a 
polypeptide of the invention or biologically active 
portion thereof with a test compound and determining the 
ability of the test compound to bind to the polypeptide 
or biologically active portion thereof. Binding of the 

3 5 test compound to the polypeptide can be determined either 
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directly or indirectly as described above. In a 
preferred embodiment, the assay includes contacting the 
polypeptide of the invention or biologically active 
portion thereof with a known compound which binds the 
5 polypeptide to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the 
ability of the test compound to interact with the 
polypeptide, wherein determining the ability of the test 
compound to interact with the polypeptide comprises 

10 determining the ability of the test compound to 

preferentially bind to the polypeptide or biologically 
active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell -free assay 
comprising contacting a polypeptide of the invention or 

15 biologically active portion thereof with a test compound 
and determining the ability of the test compound to 
modulate (e.g., stimulate or inhibit) the activity of the 
polypeptide or biologically active portion thereof. 
Determining the ability of the test compound to modulate 

20 the activity of the polypeptide can be accomplished, for 
example, by determining the ability of the polypeptide to 
bind to a target molecule by one of the methods described 
above for determining direct binding. In an alternative 
embodiment, determining the ability of the test compound 

25 to modulate the activity of the polypeptide can be 
accomplished by determining the ability of the 
polypeptide of the invention to further modulate the 
target molecule. For example, the catalytic/enzymatic 
activity of the target molecule on an appropriate 

30 substrate can be determined as previously described. 
In yet another embodiment, the cell -free assay 
comprises contacting a polypeptide of the invention or 
biologically active portion thereof with a known compound 
which binds the polypeptide to form an assay mixture, 

35 contacting the assay mixture with a test compound, and 
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determining the ability of the test compound to interact 
with the polypeptide, wherein determining the ability of 
the test compound to interact with the polypeptide 
comprises determining the ability of the polypeptide to 
5 preferentially bind to or modulate the activity of a 
target molecule. 

The cell -free assays of the present invention are 
amenable to use of both a soluble form or the membrane - 
bound form of a polypeptide of the invention. In the 

10 case of cell- free assays comprising the membrane -bound 

form of the polypeptide, it may be desirable to utilize a 
solubilizing agent such that the membrane -bound form of 
the polypeptide is maintained in solution. Examples of 
such solubilizing agents include non- ionic detergents 

15 such as n-octylglucoside, n-dodecylglucoside, n- 

dodecylmal toside , octanoyl -N-methylglucamide , decanoyl -N- 
methylglucamide, Triton X-100, Triton X-114, Thesit, 
Isotridecypoly (ethylene glycol ether)n ; 3-[(3- 
cholamidopropyl) dimethyl amminio] -1-propane sulfonate 

20 (CHAPS) , 3- [ (3-cholamidopropyl) dimethylamminio] -2- 

hydroxy- 1-propane sulfonate (CHAPSO) , or N-dodecyl=N,N- 
dimethyl - 3 - ammonio- 1 -propane sul f onate . 

In more than one embodiment of the above assay methods 
of the present invention, it may be desirable to 

25 immobilize either the polypeptide of the invention or its 
target molecule to facilitate separation of complexed 
from uncomplexed forms of one or both of the proteins, as 
well as to accommodate automation of the assay. Binding 
of a test compound to the polypeptide, or interaction of 

30 the polypeptide with a target molecule in the presence 
and absence of a candidate compound, can be accomplished 
in any vessel suitable for containing the reactants. 
Examples of such vessels include microtitre plates, test 
tubes, and micro-centrifuge tubes. In one embodiment, a 

35 fusion protein can be provided which adds a domain that 
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allows one or both of the proteins to be bound to a 
matrix. For example, glutathione-S-transf erase fusion 
proteins or glutathione-S-transf erase fusion proteins can 
be adsorbed onto glutathione sepharose beads (Sigma 
5 Chemical; St. Louis, MO) or glutathione derivatized 

microtitre plates, which are then combined with the test 
compound or the test compound and either the non- adsorbed 
target protein or A polypeptide of the invention, and the 
mixture incubated under conditions conducive to complex 

10 formation (e.g., at physiological conditions for salt and 
pH) . Following incubation, the beads or microtitre plate 
wells are washed to remove any unbound components and 
complex formation is measured either directly or 
indirectly, for example, as described above. 

15 Alternatively, the complexes can be dissociated from the 
matrix, and the level of binding or activity of the 
polypeptide of the invention can be determined using 
standard techniques. 

Other techniques for immobilizing proteins on matrices 

20 can also be used in the screening assays of the 

invention. For example, either the polypeptide of the 
invention or its target molecule can be immobilized 
utilizing conjugation of biotin and streptavidin. 
Biotinylated polypeptide of the invention or target 

25 molecules can be prepared from biotin-NHS (N-hydroxy- 
succinimide) using techniques well known in the art 
(e.g., biotinylation kit, Pierce Chemicals; Rockford, 
IL) , and immobilized in the wells of streptavidin-coated 
96 well plates (Pierce Chemical) . Alternatively, 

30 antibodies reactive with the polypeptide of the invention 
or target molecules but which do not interfere with 
binding of the polypeptide of the invention to its target 
molecule can be derivatized to the wells of the plate, 
and unbound target or polypeptidede of the invention 

35 trapped in the wells by antibody conjugation. Methods 
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for detecting such complexes, in addition to those 
described above for the GST- immobilized complexes, 
include immunodetection of complexes using antibodies 
reactive with the polypeptide of the invention or target 
5 molecule, as well as enzyme-linked assays which rely on 
detecting an enzymatic activity associated with the 
polypeptide of the invention or target molecule. 

In another embodiment, modulators of expression of a 
polypeptide of the invention are identified in a method 

10 in which a cell is contacted with a candidate compound 
and the expression of the selected mRNA or protein (i.e., 
the mRNA or protein corresponding to a polypeptide or 
nucleic acid of the invention) in the cell is determined. 
The level of expression of the selected mRNA or protein 

15 in the presence of the candidate compound is compared to 
the level of expression of the selected mRNA or protein 
in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator of 
expression of the polypeptide of the invention based on 

20 this comparison. For example, when expression of the 
selected mRNA or protein is greater (statistically 
significantly greater) in the presence of the candidate 
compound than in its absence, the candidate compound is 
identified as a stimulator of the selected mRNA or 

25 protein expression. Alternatively, when expression of 
the selected mRNA or protein is less (statistically 
significantly less) in the presence of the candidate 
compound than in its absence, the candidate compound is 
identified as an inhibitor of the selected mRNA or 

30 protein expression. The level of the selected mRNA or 
protein expression in the cells can be determined by 
methods described herein. 

In yet another aspect of the invention, a polypeptide 
of the inventions can be used as "bait proteins" in a 

35 two-hybrid assay or three hybrid assay (see, e.g., U.S. 
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Patent No. 5,283,317; Zervos et al . (1993) Cell 72:223- 
232; Madura et al . (1993) J. Biol. Chem. 268:12046-12054; 
Bartel et al. (1993) Bio/Techniques 14:920-924; Iwabuchi 
et al. (1993) Oncogene 8:1693-1696; and PCT Publication 
5 No. WO 94/10300), to identify other proteins, which bind 
to or interact with the polypeptide of the invention and 
modulate activity of the polypeptide of the invention. 
Such binding proteins are also likely to be involved in 
the propagation of signals by the polypeptide of the 

10 inventions as, for example, upstream or downstream 

elements of a signaling pathway involving the polypeptide 
of the invention. 

This invention further pertains to novel agents 
identified by the above -described screening assays and 

15 uses thereof for treatments as described herein. 

B. Detection Assays 

Portions or fragments of the cDNA sequences identified 
herein (and the corresponding complete gene sequences) 
can be used in numerous ways as polynucleotide reagents. 

20 For example, these sequences can be used to: (i) map 

their respective genes on a chromosome and, thus, locate 
gene regions associated with genetic disease; (ii) 
identify an individual from a minute biological sample 
(tissue typing) ; and (iii) aid in forensic identification 

25 of a biological sample. These applications are described 
in the subsections below. 

1 . Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a 
gene has been isolated, this sequence can be used to map 
30 the location of the gene on a chromosome. Accordingly, 
nucleic acid molecules described herein or fragments 
thereof, can be used to map the location of the 
corresponding genes on a chromosome. The mapping of the 
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sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with 
disease . 

Brief ly, genes can be mapped to chromosomes by 
5 preparing PCR primers (preferably 15-25 bp in length) 
from the sequence of a gene of the invention. Computer 
analysis of the sequence of a gene of the invention can 
be used to rapidly select primers that do not span more 
than one exon in the genomic DNA, thus complicating the 

10 amplification process. These primers can then be used 
for PCR screening of somatic cell hybrids containing 
individual human chromosomes . Only those hybrids 
containing the human gene corresponding to the gene 
sequences will yield an amplified fragment. For a review 

15 of this technique, see D'Eustachio et al . {(1983) Science 
220:919-924) . 

PCR mapping of somatic cell hybrids is a rapid 
procedure for assigning a particular sequence to a 
particular chromosome. Three or more sequences can be 

20 assigned per day using a single thermal cycler. Using 
the nucleic acid sequences of the invention to design 
oligonucleotide primers, sublocalization can be achieved 
with panels of fragments from specific chromosomes. 
Other mapping strategies which can similarly be used to 

25 map a gene to its chromosome include in situ 

hybridization (described in Fan et al . (1990) Proc. Natl. 
Acad. Sci. USA 87:6223-27), pre-screening with labeled 
flow- sorted chromosomes, and pre-selection by 
hybridization to chromosome specific cDNA libraries. 

30 Fluorescence in situ hybridization (FISH) of a DNA 

sequence to a metaphase chromosomal spread can further be 
used to provide a precise chromosomal location in one 
step. For a review of this technique, see Verma et al . , 
(Human Chromosomes: A Manual of Basic Techniques 

35 (Pergamon Press, New York, 1988)). 
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Reagents for chromosome mapping can be used 
individually to mark a single chromosome or a single site 
on that chromosome, or panels of reagents can be used for 
marking multiple sites and/or multiple chromosomes. 
5 Reagents corresponding to noncoding regions of the genes 
actually are preferred for mapping purposes. Coding 
sequences are more likely to be conserved within gene 
families, thus increasing the chance of cross 
hybridizations during chromosomal mapping. 

10 Once a sequence has been mapped to a precise 

chromosomal location, the physical position of the 
sequence on the chromosome can be correlated with genetic 
map data. (Such data are found, for example, in V. 
McKusick, Mendelian Inheritance in Man, available on-line 

15 through Johns Hopkins University Welch Medical Library) . 
The relationship between genes and disease, mapped to the 
same chromosomal region, can then be identified through 
linkage analysis (co-inheritance of physically adjacent 
genes), described in, e.g., Egeland et al . (1987) Nature 

20 325:783-787. 

Moreover, differences in the DNA sequences between 
individuals affected and unaffected with a disease 
associated with a gene of the invention can be 
determined. If a mutation is observed in some or all of 

25 the affected individuals but not in any unaffected 
individuals, then the mutation is likely to be the 
causative agent of the particular disease. Comparison of 
affected and unaffected individuals generally involves 
first looking for structural alterations in the 

3 0 chromosomes such as deletions or translocations that are 
visible from chromosome spreads or detectable using PCR 
based on that DNA sequence. Ultimately, complete 
sequencing of genes from several individuals can be 
performed to confirm the presence of a mutation and to 

35 distinguish mutations from polymorphisms. 
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2 . Tissue Typing 

The nucleic acid sequences of the present invention can 
also be used to identify individuals from minute 
biological samples. The United States military, for 
5 example, is considering the use of restriction fragment 
length polymorphism (RFLP) for identification of its 
personnel. In this technique, an individual's genomic 
DNA is digested with one or more restriction enzymes, and 
probed on a Southern blot to yield unique bands for 

10 identification. This method does not suffer from the 
current limitations of "Dog Tags" which can be lost, 
switched, or stolen, making positive identification 
difficult. The sequences of the present invention are 
useful as additional DNA markers for RFLP (described in 

15 U.S. Patent 5,272,057). 

Furthermore, the sequences of the present invention can 
be used to provide an alternative technique which 
determines the actual base-by-base DNA sequence of 
selected portions of an individual's genome. Thus, the 

2 0 nucleic acid sequences described herein can be used to 

prepare two PCR primers from the 5' and 3' ends of the 
sequences. These primers can then be used to amplify an 
individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, 
25 prepared in this manner, can provide unique individual 
identifications, as each individual will have a unique 
set of such DNA sequences due to allelic differences. 
The sequences of the present invention can be used to 
obtain such identification sequences from individuals and 

3 0 from tissue. The nucleic acid sequences of the invention 

uniquely represent portions of the human genome. Allelic 
variation occurs to some degree in the coding regions of 
these sequences, and to a greater degree in the noncoding 
regions. It is estimated that allelic variation between 
35 individual humans occurs with a frequency of about once 



WO 00/18904 



PCT/US99/22817 



- 106 - 

per each 500 bases. Each of the sequences described 
herein can, to some degree, be used as a standard against 
which DNA from an individual can be compared for 
identification purposes. Because greater numbers of 
5 polymorphisms occur in the noncoding regions, fewer 
sequences are necessary to differentiate individuals. 
For example, the noncoding sequences of SEQ ID NO:l can 
comfortably provide positive individual identification 
with a panel of perhaps 10 to 1,000 primers which each 

10 yield a noncoding amplified sequence of 100 bases. If 
predicted coding sequences, such as those in SEQ ID NO: 3 
are used, a more appropriate number of primers for 
positive individual identification would be 500-2,000. 
If a panel of reagents from the nucleic acid sequences 

15 described herein is used to generate a unique 

identification database for an individual, those same 
reagents can later be used to identify tissue from that 
individual. Using the unique identification database, 
positive identification of the individual, living or 

20 dead, can be made from extremely small tissue samples. 

3. Use of Partial Gene Sequences in Forensic Biology 
DNA^based identification techniques can also be used in 
forensic biology. Forensic biology is a scientific field 
employing genetic typing of biological evidence found at 

25 a crime scene as a means for positively identifying, for 
example, a perpetrator of a crime. To make such an 
identification, PCR technology can be used to amplify DNA 
sequences taken from very small biological samples such 
as tissues, e.g., hair or skin, or body fluids, e.g., 

30 blood, saliva, or semen found at a crime scene. The 
amplified sequence can then be compared to a standard, 
thereby allowing identification of the origin of the 
biological sample. 
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The sequences of the present invention can be used to 
provide polynucleotide reagents, e.g., PCR primers, 
targeted to specific loci in the human genome, which can 
enhance the reliability of DNA-based forensic 
5 identifications by, for example, providing another 

"identification marker" (i.e. another DNA sequence that 
is unique to a particular individual) . As mentioned 
above, actual base sequence information can be used for 
identification as an accurate alternative to patterns 

10 formed by restriction enzyme generated fragments. 

Sequences targeted to noncoding regions are particularly 
appropriate for this use as greater numbers of 
polymorphisms occur in the noncoding regions, making it 
easier to differentiate individuals using this technique. 

15 Examples of polynucleotide reagents include the nucleic 
acid sequences of the invention or portions thereof, 
e.g., fragments derived from noncoding regions having a 
length of at least 20 or 30 bases. 

The nucleic acid sequences described herein can further 

20 be used to provide polynucleotide reagents, e.g., labeled 
or labelable probes which can be used in, for example, an 
in situ hybridization technique, to identify a specific 
tissue, e.g., brain tissue. This can be very useful in 
cases where a forensic pathologist is presented with a 

25 tissue of unknown origin. Panels of such probes can be 
used to identify tissue by species and/or by organ type. 



C. Predictive Medicine 

The present invention also pertains to the field of 
predictive medicine in which diagnostic assays, 
30 prognostic assays, pharmacogenomics , and monitoring 
clinical trails are used for prognostic (predictive) 
purposes to thereby treat an individual prophylactically . 
Accordingly, one aspect of the present invention relates 
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to diagnostic assays for determining expression of a 
polypeptide or nucleic acid of the invention and/or 
activity of a polypeptide of the invention, in the 
context of a biological sample (e.g., blood, serum, 
5 cells, tissue) to thereby determine whether an individual 
is afflicted with a disease or disorder, or is at risk of 
developing a disorder, associated with aberrant 
expression or activity of a polypeptide of the invention. 
The invention also provides for prognostic (or 

10 predictive) assays for determining whether an individual 
is at risk of developing a disorder associated with 
aberrant expression or activity of a polypeptide of the 
invention. For example, mutations in a gene of the 
invention can be assayed in a biological sample. Such 

15 assays can be used for prognostic or predictive purpose 
to thereby prophylactically treat an individual prior to 
the onset of a disorder characterized by or associated 
with aberrant expression or activity of a polypeptide of 
the invention. 

20 Another aspect of the invention provides methods for 
expression of a nucleic acid or polypeptide of the 
invention or activity of a polypeptide of the invention 
in an individual to thereby select appropriate 
therapeutic or prophylactic agents for that individual 

25 (referred to herein as "pharmacogenomics") . 

Pharmacogenomics allows for the selection of agents 
(e.g., drugs) for therapeutic or prophylactic treatment 
of an individual based on the genotype of the individual 
(e.g., the genotype of the individual examined to 

30 determine the ability of the individual to respond to a 
particular agent) . 

Yet another aspect of the invention pertains to 
monitoring the influence of agents (e.g., drugs or other 
compounds) on the expression or activity of a polypeptide 

35 of the invention in clinical trials. 
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These and other agents are described in further detail 
in the following sections. 

1 . Diagnostic Assays 

An exemplary method for detecting the presence or 
5 absence of a polypeptide or nucleic acid of the invention 
in a biological sample involves obtaining a biological 
sample from a test subject and contacting the biological 
sample with a compound or an agent capable of detecting a 
polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 

10 the invention such that the presence of a polypeptide or 
nucleic acid of the invention is detected in the 
biological sample. A preferred agent for detecting mRNA 
or genomic DNA encoding a polypeptide of the invention is 
a labeled nucleic acid probe capable of hybridizing to 

15 mRNA or genomic DNA encoding a polypeptide of the 

invention. The nucleic acid probe can be, for example, a 
full-length cDNA, such as the nucleic acid of SEQ ID 

NOs:l-22, 34-43, and - or a portion thereof, such 

as an oligonucleotide of at least 15, 30, 50, 100, 250 or 

20 500 nucleotides in length and sufficient to specifically 
hybridize under stringent conditions to a mRNA or genomic 
DNA encoding a polypeptide of the invention. Other 
suitable probes for use in the diagnostic assays of the 
invention are described herein. 

25 A preferred agent for detecting A polypeptide of the 
invention is an antibody capable of binding to A 
polypeptide of the invention, preferably an antibody with 
a detectable label. Antibodies can be polyclonal, or 
more preferably, monoclonal. An intact antibody, or a 

30 fragment thereof (e.g., Fab or F(ab') 2 ) can be used. The 
term "labeled" , with regard to the probe or antibody, is 
intended to encompass direct labeling of the probe or 
antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as 
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indirect labeling of the probe or antibody. by reactivity 
with another reagent that is directly labeled. Examples 
of indirect labeling include detection of a primary 
antibody using a f luorescently labeled secondary antibody 
5 and end- labeling of a DNA probe with biotin such that it 
can be detected with f luorescently labeled streptavidin. 
The term "biological sample" is intended to include 
tissues, cells and biological fluids isolated from a 
subject, as well as tissues, cells and fluids present 

10 within a subject. That is, the detection method of the 
invention can be used to detect mRNA, protein, or genomic 
DNA in a biological sample in vitro as well as in vivo. 
For example, in vitro techniques for detection of mRNA 
include Northern hybridizations and in situ 

15 hybridizations. In vitro techniques for detection of A 
polypeptide of the invention include enzyme linked 
immunosorbent assays (ELISAs) , Western blots, 
immunoprecipitations and immunofluorescence. In vitro 
techniques for detection of genomic DNA include Southern 

20 hybridizations. Furthermore, in vivo techniques for 
detection of a polypeptide of the invention include 
introducing into a subject a labeled antibody directed 
against the polypeptide. For example, the antibody can 
be labeled with a radioactive marker whose presence and 

25 location in a subject can be detected by standard imaging 
techniques. 

In one embodiment, the biological sample contains 
protein molecules from the test subject. Alternatively, 
the biological sample can contain mRNA molecules from the 
30 test subject or genomic DNA molecules from the test 

subject. A preferred biological sample is a peripheral 
blood leukocyte sample isolated by conventional means 
from a subject. 

In another embodiment, the methods further involve 
35 obtaining a control biological sample from a control 
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subject, contacting the control sample with a compound or 
agent capable of detecting a polypeptide of the invention 
or mRNA or genomic DNA encoding a polypeptide of the 
invention, such that the presence of the polypeptide or 
5 mRNA or genomic DNA encoding the polypeptide is detected 
in the biological sample, and comparing the presence of 
the polypeptide or mRNA or genomic DNA encoding the 
polypeptide in the control sample with the presence of 
the polypeptide or mRNA or genomic DNA encoding the 

10 polypeptide in the test sample. 

The invention also encompasses kits for detecting the 
presence of a polypeptide or nucleic acid of the 
invention in a biological sample (a test sample) . Such 
kits can be used to determine if a subject is suffering 

15 from or is at increased risk of developing a disorder 
associated with aberrant expression of a polypeptide of 
the invention (e.g., an immunological disorder). For 
example, the kit can comprise a labeled compound or agent 
capable of detecting the polypeptide or mRNA encoding the 

20 polypeptide in a biological sample and means for 

determining the amount of the polypeptide or mRNA in the 
sample (e.g., an antibody which binds the polypeptide or 
an oligonucleotide probe which binds to DNA or mRNA 
encoding the polypeptide) . Kits can also include 

25 instruction for observing that the tested subject is 
suffering from or is at risk of developing a disorder 
associated with aberrant expression of the polypeptide if 
the amount of the polypeptide or mRNA encoding the 
polypeptide is above or below a normal level. 

30 For antibody-based kits, the kit can comprise, for 

example: (1) a first antibody (e.g., attached to a solid 
support) which binds to a polypeptide of the invention; 
and, optionally, (2) a second, different antibody which 
binds to either the polypeptide or the first antibody and 

35 is conjugated to a detectable agent. 
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For oligonucleotide-based kits, the kit can comprise, 
for example: (1) an oligonucleotide, e.g., a detectably 
labeled oligonucleotide, which hybridizes to a nucleic 
acid sequence encoding a polypeptide of the invention or 
5 (2) a pair of primers useful for amplifying a nucleic 
acid molecule encoding a polypeptide of the invention. 

The kit can also comprise, e.g., a buffering agent, a 
preservative, or a protein stabilizing agent. The kit 
can also comprise components necessary for detecting the 

10 detectable agent (e.g., an enzyme or a substrate). The 
kit can also contain a control sample or a series of 
control samples which can be assayed and compared to the 
test sample contained. Each component of the kit is 
usually enclosed within an individual container and all 

15 of the various containers are within a single package 

along with instructions for observing whether the tested 
subject is suffering from or is at risk of developing a 
disorder associated with aberrant expression of the 
polypeptide. 

20 2. Prognostic Assays 

The methods described herein can furthermore be 
utilized as diagnostic or prognostic assays to identify 
subjects having or at risk of developing a disease or 
disorder associated with aberrant expression or activity 

25 of a polypeptide of the invention. For example, the 

assays described herein, such as the preceding diagnostic 
assays or the following assays, can be utilized to 
identify a subject having or at risk of developing a 
disorder associated with aberrant expression or activity 

30 of a polypeptide of the invention. Alternatively, the 
prognostic assays can be utilized to identify a subject 
having or at risk for developing such a disease or 
disorder. Thus, the present invention provides a method 
in which a test sample is obtained from a subject and a 
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polypeptide or nucleic acid (e.g., mRNA, genomic DNA) of 
the invention is detected, wherein the presence of the 
polypeptide or nucleic acid is diagnostic for a subject 
having or at risk of developing a disease or disorder 
5 associated with aberrant expression or activity of the 
polypeptide. As used herein, a "test sample" refers to a 
biological sample obtained from a subject of interest. 
For example, a test sample can be a biological fluid 
(e.g., serum), cell sample, or tissue. 

10 Furthermore, the prognostic assays described herein can 
be used to determine whether a subject can be 
administered an agent (e.g., an agonist, antagonist, 
peptidomimetic, protein, peptide, nucleic acid, small 
molecule, or other drug candidate) to treat a disease or 

15 disorder associated with aberrant expression or activity 
of a polypeptide of the invention. For example, such 
methods can be used to determine whether a subject can be 
effectively treated with a specific agent or class of 
agents (e.g., agents of a type which decrease activity of 

2 0 the polypeptide) . Thus, the present invention provides 
methods for determining whether a subject can be 
effectively treated with an agent for a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention in which a test sample is 

25 obtained and the polypeptide or nucleic acid encoding the 
polypeptide is detected (e.g., wherein the presence of 
the polypeptide or nucleic acid is diagnostic for a 
subject that can be administered the agent to treat a 
disorder associated with aberrant expression or activity 

30 of the polypeptide) . 

The methods of the invention can also be used to detect 
genetic lesions or mutations in a gene of the invention, 
thereby determining if a subject with the lesioned gene 
is at risk for a disorder characterized aberrant 

35 expression or activity of a polypeptide of the invention. 
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In preferred embodiments, the methods include detecting, 
in a sample of cells from the subject, the presence or 
absence of a genetic lesion or mutation characterized by 
at least one of an alteration affecting the integrity of 
5 a gene encoding the polypeptide of the invention, or the 
mis-expression of the gene encoding the polypeptide of 
the invention. For example, such genetic lesions or 
mutations can be detected by ascertaining the existence 
of at least one of: 1) a deletion of one or more 

10 nucleotides from the gene; 2) an addition of one or more 
nucleotides to the gene; 3) a substitution of one or more 
nucleotides of the gene; 4) a chromosomal rearrangement 
of the gene; 5) an alteration in the level of a messenger 
RNA transcript of the gene; 6) an aberrant modification 

15 of the gene, such as of the methylation pattern of the 
genomic DNA; 7) the presence of a non-wild type splicing 
pattern of a messenger RNA transcript of the gene; 8) a 
non-wild type level of a the protein encoded by the gene; 
9) an allelic loss of the gene; and 10) an inappropriate 

20 post-translational modification of the protein encoded by 
the gene. As described herein, there are a large number 
of assay techniques known in the art which can be used 
for detecting lesions in a gene. 

In certain embodiments, detection of the lesion 

25 involves the use of a probe/primer in a polymerase chain 
reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 
4,683,202), such as anchor PCR or RACE PCR, or, 
alternatively, in a ligation chain reaction (LCR) (see, 
e.g., Landegran et al . (1988) Science 241:1077-1080; and 

30 Nakazawa et al . (1994) Proc. Natl. Acad. Sci . USA 91:360- 
364) , the latter of which can be particularly useful for 
detecting point mutations in a gene (see, e.g., Abravaya 
et al. (1995) Nucleic Acids Res. 23:675-682). This 
method can include the steps of collecting a sample of 

35 cells from a patient, isolating nucleic acid (e.g., 
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genomic, mRNA or both) from the cells of the sample, 
contacting the nucleic acid sample with one or more 
primers which specifically hybridize to the selected gene 
under conditions such that hybridization and 
5 amplification of the gene (if present) occurs, and 
detecting the presence or absence of an amplification 
product, or detecting the size of the amplification 
product and comparing the length to a control sample. It 
is anticipated that PCR and/or LCR may be desirable to 

10 use as a preliminary amplification step in conjunction 
with any of the techniques used for detecting mutations 
described herein. 

Alternative amplification methods include: self 
sustained sequence replication (Guatelli et al . (1990) 

15 Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional 
amplification system (Kwoh, et al . (1989) Proc. Natl. 
Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi 
et al . (1988) Bio/Technology 6:1197), or any other 
nucleic acid amplification method, followed by the 

20 detection of the amplified molecules using techniques 

well known to those of skill in the art. These detection 
schemes are especially useful for the detection of 
nucleic acid molecules if such molecules are present in 
very low numbers. 

25 In an alternative embodiment, mutations in a selected 
gene from a sample cell can be identified by alterations 
in restriction enzyme cleavage patterns. For example, 
sample and control DNA is isolated, amplified 
(optionally) , digested with one or more restriction 

3 0 endonucleases, and fragment length sizes are determined 
by gel electrophoresis and compared. Differences in 
fragment length sizes between sample and control DNA 
indicates mutations in the sample DNA. Moreover, the use 
of sequence specific ribozymes (see, e.g., U.S. Patent 

35 No. 5,498,531) can be used to score for the presence of 
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specific mutations by development or loss of a ribozyme 
cleavage site. 

In other embodiments, genetic mutations can be 
identified by hybridizing a sample and control nucleic 
5 acids, e.g., DNA or RNA, to high density arrays 

containing hundreds or thousands of oligonucleotides 
probes (Cronin et al . (1996) Human Mutation 7:244-255; 
Kozal et al . (1996) Nature Medicine 2:753-759). For 
example, genetic mutations can be identified in two- 

10 dimensional arrays containing light -generated DNA probes 
as described in Cronin et al . , supra. Briefly, a first 
hybridization array of probes can be used to scan through 
long stretches of DNA in a sample and control to identify 
base changes between the sequences by making linear 

15 arrays of sequential overlapping probes. This step 

allows the identification of point mutations. This step 
is followed by a second hybridization array that allows 
the characterization of specific mutations by using 
smaller, specialized probe arrays complementary to all 

20 variants or mutations detected. Each mutation array is 
composed of parallel probe sets, one complementary to the 
wild- type gene and the other complementary to the mutant 
gene. 

In yet another embodiment, any of a variety of 
25 sequencing reactions known in the art can be used to 

directly sequence the selected gene and detect mutations 
by comparing the sequence of the sample nucleic acids 
with the corresponding wild-type (control) sequence. 
Examples of sequencing reactions include those based on 
30 techniques developed by Maxim and Gilbert ((1977) Proc. 
Natl. Acad. Sci . USA 74:560) or Sanger ((1977) Proc. 
Natl. Acad. Sci. USA 74:5463). It is also contemplated 
that any of a variety of automated sequencing procedures 
can be utilized when performing the diagnostic assays 
35 ((1995) Bio/Techniques 19:448), including sequencing by 
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mass spectrometry (see, e.g., PCT Publication No. WO 
94/16101; Cohen et al . (1996) Adv. Chromatogr. 36:127- 
162; and Griffin et al . (1993) Appl . Biochem. Biotechnol . 
38:147-159). 

5 Other methods for detecting mutations in a selected 
gene include methods in which protection from cleavage 
agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA heteroduplexes (Myers et al . (1985) Science 
230:1242). In general, the technique of "mismatch 

10 cleavage" entails providing heteroduplexes formed by 

hybridizing (labeled) RNA or DNA containing the wild- type 
sequence with potentially mutant RNA or DNA obtained from 
a tissue sample. The double- stranded duplexes are 
treated with an agent which cleaves single-stranded 

15 regions of the duplex such as which will exist due to 
basepair mismatches between the control and sample 
strands. RNA/DNA duplexes can be treated with RNase to 
digest mismatched regions, and DNA/ DNA hybrids can be 
treated with SI nuclease to digest mismatched regions. 

2 0 In other embodiments, either DNA/ DNA or RNA/DNA duplexes 
can be treated with hydroxylamine or osmium tetroxide and 
with piperidine in order to digest mismatched regions. 
After digestion of the mismatched regions, the resulting 
material is then separated by size on denaturing 

25 polyacrylamide gels to determine the site of mutation. 
See, e.g., Cotton et al . (1988) Proc. Natl. Acad. Sci . 
USA 85:4397; Saleeba et al . (1992) Methods Enzymol. 
217:286-295. In a preferred embodiment, the control DNA 
or RNA can be labeled for detection. 

30 In still another embodiment, the mismatch cleavage 
reaction employs one or more proteins that recognize 
mismatched base pairs in double- stranded DNA (so called 
"DNA mismatch repair" enzymes) in defined systems for 
detecting and mapping point mutations in cDNAs obtained 

35 from samples of cells. For example, the mutY enzyme of 
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E. coli cleaves A at G/A mismatches and the thymidine DNA 
glycosylase from HeLa cells cleaves T at G/T mismatches 
(Hsu et al. (1994) Carcinogenesis 15:1657-1662). 
According to an exemplary embodiment, a probe based on a 
5 selected sequence, e.g., a wild-type sequence, is 
hybridized to a cDNA or other DNA product from a test 
cell(s) . The duplex is treated with a DNA mismatch 
repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. 

10 See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic 
mobility will be used to identify mutations in genes. 
For example, single strand conformation polymorphism 
(SSCP) may be used to detect differences in 

15 electrophoretic mobility between mutant and wild type 

nucleic acids (Orita et al. (1989) Proc. Natl. Acad. Sci . 
USA 86:2766; see also Cotton (1993) Mutat. JRes. 285:125- 
144; Hayashi (1992) Genet. Anal. Tech. Appl . 9:73-79). 
Single-stranded DNA fragments of sample and control 

20 nucleic acids will be denatured and allowed to renature . 
The secondary structure of single- stranded nucleic acids 
varies according to sequence, and the resulting 
alteration in electrophoretic mobility enables the 
detection of even a single base change. The DNA 

25 fragments may be labeled or detected with labeled probes. 
The sensitivity of the assay may be enhanced by using RNA 
(rather than DNA) , in which the secondary structure is 
more sensitive to a change in sequence. In a preferred 
embodiment, the subject method utilizes heteroduplex 

30 analysis to separate double stranded heteroduplex 
molecules on the basis of changes in electrophoretic 
mobility (Keen et al . (1991) Trends Genet. 7:5). 

In yet another embodiment, the movement of mutant or 
wild-type fragments in polyacrylamide gels containing a 

35 gradient of denaturant is assayed using denaturing 
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gradient gel electrophoresis (DGGE) (Myers et al . (1985) 
Nature 313:495) . When DGGE is used as the method of 
analysis, DNA will be modified to insure that it does not 
completely denature, for example by adding a X GC clamp of 
5 approximately 40 bp of high-melting GC-rich DNA by PCR. 
In a further embodiment, a temperature gradient is used 
in place of a denaturing gradient to identify differences 
in the mobility of control and sample DNA (Rosenbaum and 
Reissner (1987) Biophys . Chem. 265:12753). 

10 Examples of other techniques for detecting point 
mutations include, but are not limited to, selective 
oligonucleotide hybridization, selective amplification, 
or selective primer extension. For example, 
oligonucleotide primers may be prepared in which the 

15 known mutation is placed centrally and then hybridized to 
target DNA under conditions which permit hybridization 
only if a perfect match is found (Saiki et al . (1986) 
Nature 324:163); Saiki et al . (1989) Proc. Natl. Acad. 
Sci. USA 86:6230). Such allele specific oligonucleotides 

20 are hybridized to PCR amplified target DNA or a number of 
different mutations when the oligonucleotides are 
attached to the hybridizing membrane and hybridized with 
labeled target DNA. 

Alternatively, allele specific amplification technology 

25 which depends on selective PCR amplification may be used 
in conjunction with the instant invention. 
Oligonucleotides used as primers for specific 
amplification may carry the mutation of interest in the 
center of the molecule (so that amplification depends on 

3 0 different ial hybridization) (Gibbs et al . (1989) Nucleic 
Acids Res. 17:2437-2448) or at the extreme 3' end of one 
primer where, under appropriate conditions, mismatch can 
prevent or reduce polymerase extension (Prossner (1993) 
Tibtech 11:238). In addition, it may be desirable to 

35 introduce a novel restriction site in the region of the 
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mutation to create cleavage -based detection (Gasparini et 
al. (1992) Mol. Cell Probes 6:1). It is anticipated that 
in certain embodiments amplification may also be 
performed using Taq ligase for amplification (Barany 
5 (1991) Proc. Natl. Acad. Sci . USA 88:189). In such 
cases, ligation will occur only if there is a perfect 
match at the 3' end of the 5' sequence making it possible 
to detect the presence of a known mutation at a specific 
site by looking for the presence or absence of 

10 amplification. 

The methods described herein may be performed, for 
example, by utilizing pre-packaged diagnostic kits 
comprising at least one probe nucleic acid or antibody 
reagent described herein, which may be conveniently used, 

15 e.g., in clinical settings to diagnose patients 

exhibiting symptoms or family history of a disease or 
illness involving a gene encoding a polypeptide of the 
invention. 

Furthermore, any cell type or tissue, preferably 
20 peripheral blood leukocytes, in which the polypeptide of 
the invention is expressed may be utilized in the 
prognostic assays described herein. 

3 . Pharmacoaenomics 

25 Agents, or modulators which have a stimulatory or 
inhibitory effect on activity or expression of a 
polypeptide of the invention as identified by a screening 
assay described herein can be administered to individuals 
to treat (prophylactically or therapeutically) disorders 

30 associated with aberrant activity of the polypeptide. In 
conjunction with such treatment, the pharmacogenomics 
(i.e., the study of the relationship between an 
individual's genotype and that individual's response to a 
foreign compound or drug) of the individual may be 

35 considered. Differences in metabolism of therapeutics 
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can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood 
concentration of the pharmacologically active drug. Thus, 
the pharmacogenomics of the individual permits the 
5 selection of effective agents (e.g., drugs) for 
prophylactic or therapeutic treatments based on a 
consideration of the individual's genotype. Such 
pharmacogenomics can further be used to determine 
appropriate dosages and therapeutic regimens. 

10 Accordingly, the activity of a polypeptide of the 
invention, expression of a nucleic acid of the 
invention, or mutation content of a gene of the invention 
in an individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

15 treatment of the individual. 

Pharmacogenomics deals with clinically significant 
hereditary variations in the response to drugs due to 
altered drug disposition and abnormal action in affected 
persons. See, e.g., Linder (1997) Clin. Chem. 43(2) :254- 

20 266. In general, two types of pharmacogenetic conditions 
can be differentiated. Genetic conditions transmitted as 
a single factor altering the way drugs act on the body 
are referred to as "altered drug action." Genetic 
conditions transmitted as single factors altering the way 

25 the body acts on drugs are referred to as "altered drug 
metabolism" . These pharmacogenetic conditions can occur 
either as rare defects or as polymorphisms. For example, 
glucose- 6 -phosphate dehydrogenase deficiency (G6PD) is a 
common inherited enzymopathy in which the main clinical 

3 0 complication is haemolysis after ingestion of oxidant 
drugs (anti-malarials, sulfonamides, analgesics, 
nitrofurans) and consumption of fava beans. 

As an illustrative embodiment, the activity of drug 
metabolizing enzymes is a major determinant of both the 

35 intensity and duration of drug action. The discovery of 
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genetic polymorphisms of drug metabolizing enzymes (e.g., 
N-acetyltransf erase 2 (NAT 2) and cytochrome P450 enzymes 
CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or 
5 show exaggerated drug response and serious toxicity after 
taking the standard and safe dose of a drug. These 
polymorphisms are expressed in two phenotypes in the 
population, the extensive metabolizer (EM) and poor 
metabolizer (PM) . The prevalence of PM is different 

10 among different populations. For example, the gene 
coding for CYP2D6 is highly polymorphic and several 
mutations have been identified in PM, which all lead to 
the absence of functional CYP2D6 . Poor metabolizers of 
CYP2D6 and CYP2C19 quite frequently experience 

15 exaggerated drug response and side effects when they 
receive standard doses. If a metabolite is the active 
therapeutic moiety, a PM will show no therapeutic 
response, as demonstrated for the analgesic effect of 
codeine mediated by its CYP2D6- formed metabolite 

20 morphine. The other extreme are the so called ultra- 
rapid metabolizers who do not respond to standard doses. 
Recently, the molecular basis of ultra-rapid metabolism 
has been identified to be due to CYP2D6 gene 
amplification. 

25 Thus, the activity of a polypeptide of the invention, 
expression of a nucleic acid encoding the polypeptide, or 
mutation content of a gene encoding the polypeptide in an 
individual can be determined to thereby select 
appropriate agent (s) for therapeutic or prophylactic 

30 treatment of the individual. In addition, 

pharmacogenetic studies can be used to apply genotyping 
of polymorphic alleles encoding drug -metabolizing enzymes 
to the identification of an individual's drug 
responsiveness phenotype . This knowledge, when applied 

35 to dosing or drug selection, can avoid adverse reactions 
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or therapeutic failure and thus enhance therapeutic or 
prophylactic efficiency when treating a subject with a 
modulator of activity or expression of the polypeptide, 
such as a modulator identified by one of the exemplary 
5 screening assays described herein. 

4 . Monitoring of Effects Purine? Clinical Trials 
Monitoring the influence of agents (e.g., drugs, 
compounds) on the expression or activity of a polypeptide 
of the invention (e.g., the ability to modulate aberrant 

10 cell proliferation and/or differentiation) can be applied 
not only in basic drug screening, but also in clinical 
trials. For example, the effectiveness of an agent, as 
determined by a screening assay as described herein, to 
increase gene expression, protein levels or protein 

15 activity, can be monitored in clinical trials of subjects 
exhibiting decreased gene expression, protein levels, or 
protein activity. Alternatively, the effectiveness of an 
agent, as determined by a screening assay, to decrease 
gene expression, protein levels or protein activity, can 

20 be monitored in clinical trials of subjects exhibiting 
increased gene expression, protein levels, or protein 
activity. In such clinical trials, expression or 
activity of a polypeptide of the invention and 
preferably, that of other polypeptide that have been 

25 implicated in for example, a cellular proliferation 
disorder, can be used as a marker of the immune 
responsiveness of a particular cell. 

For example, and not by way of limitation, genes, 
including those of the invention, that are modulated in 

30 cells by treatment with an agent (e.g., compound, drug or 
small molecule) which modulates activity or expression of 
a polypeptide of the invention (e.g., as identified in a 
screening assay described herein) can be identified. 
Thus, to study the effect of agents on cellular 
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proliferation disorders, for example, in a clinical 
trial, cells can be isolated and RNA prepared and 
analyzed for the levels of expression of a gene of the 
invention and other genes implicated in the disorder. 
5 The levels of gene expression (i.e., a gene expression 
pattern) can be quantified by Northern blot analysis or 
RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the 
methods as described herein, or by measuring the levels 

10 of activity of a gene of the invention or other genes. 
In this way, the gene expression pattern can serve as a 
marker, indicative of the physiological response of the 
cells to the agent. Accordingly, this response state may 
be determined before, and at various points during, 

15 treatment of the individual with the agent. 

In a preferred embodiment, the present invention 
provides a method for monitoring the effectiveness of 
treatment of a subject with an agent (e.g., an agonist, 
antagonist, peptidomimetic, protein, peptide, nucleic 

2 0 acid, small molecule, or other drug candidate identified 
by the screening assays described herein) comprising the 
steps of (i) obtaining a pre-administration sample from a 
subject prior to administration of the agent; (ii) 
detecting the level of the polypeptide or nucleic acid of 

25 the invention in the preadministration sample; (iii) 
obtaining one or more post-administration samples from 
the subject; (iv) detecting the level the of the 
polypeptide or nucleic acid of the invention in the post- 
administration samples; (v) comparing the level of the 

30 polypeptide or nucleic acid of the invention in the pre- 
administration sample with the level of the polypeptide 
or nucleic acid of the invention in the post- 
administration sample or samples; and (vi) altering the 
administration of the agent to the subject accordingly. 

35 For example, increased administration of the agent may be 
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desirable to increase the expression or activity of the 
polypeptide to higher levels than detected, i.e., to 
increase the effectiveness of the agent. Alternatively, 
decreased administration of the agent may be desirable to 
5 decrease expression or activity of the polypeptide to 
lower levels than detected, i.e., to decrease the 
effectiveness of the agent. 

C. Methods of Treatment 

The present invention provides for both prophylactic 
10 and therapeutic methods of treating a subject at risk of 
(or susceptible to) a disorder or having a disorder 
associated with aberrant expression or activity of a 
polypeptide of the invention. 

1 . Prophylactic Methods 

15 In one aspect, the invention provides a method for 
preventing in a subject, a disease or condition 
associated with an aberrant expression or activity of a 
polypeptide of the invention, by administering to the 
subject an agent which modulates expression or at least 

20 one activity of the polypeptide. Subjects at risk for a 
disease which is caused or contributed to by aberrant 
expression or activity of a polypeptide of the invention 
can be identified by, for example, any or a combination 
of diagnostic or prognostic assays as described herein. 

25 Administration of a prophylactic agent can occur prior to 
the manifestation of symptoms characteristic of the 
aberrancy, such that a disease or disorder is prevented 
or, alternatively, delayed in its progression. Depending 
on the type of aberrancy, for example, an agonist or 

30 antagonist agent can be used for treating the subject. 
The appropriate agent can be determined based on 
screening assays described herein. 
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2 . Therapeutic Methods 

Another aspect of the invention pertains to methods of 
modulating expression or activity of a polypeptide of the 
invention for therapeutic purposes. The modulatory 
5 method of the invention involves contacting a cell with 
an agent that modulates one or more of the activities of 
the polypeptide. An agent that modulates activity can be 
an agent as described herein, such as a nucleic acid or a 
protein, a naturally-occurring cognate ligand of the 

10 polypeptide, a peptide, a peptidomimetic, or other small 
molecule. In one embodiment, the agent stimulates one or 
more of the biological activities of the polypeptide. 
Examples of such stimulatory agents include the active 
polypeptide of the invention and a nucleic acid molecule 

15 encoding the polypeptide of the invention that has been 
introduced into the cell. In another embodiment, the 
agent inhibits one or more of the biological activities 
of the polypeptide of the invention. Examples of such 
inhibitory agents include antisense nucleic acid 

20 molecules and antibodies. These modulatory methods can 
be performed in vitro (e.g., by culturing the cell with 
the agent) or, alternatively, in vivo (e.g, by 
administering the agent to a subject) . As such, the 
present invention provides methods of treating an 

25 individual afflicted with a disease or disorder 
characterized by aberrant expression or activity a 
polypeptide of the invention. In one embodiment, the 
method involves administering an agent (e.g., an agent 
identified by a screening assay described herein) , or 

30 combination of agents that modulates (e.g., upregulates 
or downregulates) expression or activity. In another 
embodiment, the method involves administering a 
polypeptide of the invention or a nucleic acid molecule 
of the invention as therapy to compensate for reduced or 

35 aberrant expression or activity of the polypeptide. 
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Stimulation of activity is desirable in situations in 
which activity or expression is abnormally low 
downregulated and/or in which increased activity is 
likely to have a beneficial effect. Conversely, 
5 inhibition of activity is desirable in situations in 
which activity or expression is abnormally high or 
upregulated and/or in which decreased activity is likely 
to have a beneficial effect. 

This invention is further illustrated by the following 
10 examples which should not be construed as limiting. The 
contents of all references, patents and published patent 
applications cited throughout this application are hereby 
incorporated by reference. 

EXAMPLES 

15 TANGO 180, TANGO 181, TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189 and TANGO 187, 
were identified in a human prostate epithelial cell 
library. TANGO 215 was identified in a human prostate 
stromal cell library. 

20 TANGO 180, TANGO 181, TANGO 182, TANGO 183, TANGO 184, 
TANGO 185, TANGO 186, TANGO 188, TANGO 189, TANGO 215, 
and TANGO 187 were identified by first analyzing clones 
present in the two libraries to identify EST sequences 
which potentially encode a signal peptide having at least 

25 15 amino acids. Selected clones which include an EST 

sequence that appeared to encode a signal peptide having 
at least 15 amino acids were used to assemble additional 
EST sequences to form potential full-length gene 
sequences. The assembled full-length gene sequences were 

30 then used to identify actual full-length clones in the 
two libraries. 
Deposit of Clones 

Clones containing cDNA molecules encoding TANGO 180, 
TANGO 181, TANGO 182, TANGO 183, TANGO 184, TANGO 185, 
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I- 

TANGO 186, TANGO 188, TANGO 189, TANGO 215 and TANGO 187 
were deposited with the American Type Culture Collection 
(Manassas, VA) as composite deposits. 

Clones encoding TANGO 180, TANGO 181, TANGO 182 and 
5 TANGO 183, and TANGO 184 were deposited on September 25, 
1998 with the American Type Culture Collection under 
accession number ATCC 98901, from which each clone 
comprising a particular cDNA clone is obtainable. This 
deposit is a mixture of five strains, each carrying one 

10 recombinant plasmid harboring a particular cDNA clone. 

To distinguish the strains and isolate a strain harboring 
a particular cDNA clone, one can first streak out an 
aliquot of the mixture to single colonies on nutrient 
medium (e.g., LB plates) supplemented with 100/ig/ml 

15 ampicillin, grow single colonies, and then extract the 
plasmid DNA using a standard minipreparation procedure. 
Next, one can digest a sample of the DNA minipreparation 
with a combination of the restriction enzymes Sal I and 
Not I and resolve the resultant products on a 0.8% 

20 agarose gel using standard DNA electrophoresis 

conditions. The digest will liberate fragments as 
follows : 

TANGO 180 (EpT180) 1.2 kb and 2 . 7 kb 
TANGO 181 (EpT181) 4 . 5 kb and 2.7 kb 
25 TANGO 182 (EpT182) two 2 . 7 kb fragments 
TANGO 183 (EpT183) 1 . 6 kb and 2 . 7 kb 
TANGO 184 (EpT184) 4 . 5 kb 

The identity of the strains can be inferred from the 

I 

fragments liberated. 

30 Clones encoding TANGO 185, TANGO 186, TANGO 187, TANGO 
188 and TANGO 189 (splice variant 1) were deposited on 
September 25, 1998 with the American Type Culture 
Collection under accession number ATCC 98900, from which 
each stain comprising a particular cDNA clone is 

35 obtainable. The deposit is a mixture of five strains, 
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each carrying one recombinant plasmid harboring a 
particular cDNA clone. To distinguish the strains and 
isolate a strain harboring a particular cDNA clone, one 
can first streak out an aliquot of the mixture to single 
5 colonies on nutrient medium (e.g., LB plates) 

supplemented with 100/ig/ml ampicillin, grow single 
colonies, and then extract the plasmid DNA using a 
standard minipreparation procedure. Next, one can digest 
a sample of the DNA minipreparation with a combination of 

10 the restriction enzymes Sal I and Not I and resolve the 
resultant products on a 0.8% agarose gel using standard 
DNA electrophoresis conditions. The digest will liberate 
one vector fragment of 2.7 kb common to all strains, and 
one insert-specific fragment as follows: 

15 TANGO 185 (EpT185) 2 . 1 kb 

TANGO 186 (EpT186) 3 . 7 kb 

TANGO 187 (EpT187) 2 . 6 kb 

TANGO 188 (EpT188) 2 . 0 kb 

TANGO 189 (EpT189svl) 1 . 3 kb 

20 The identity of the strains can be inferred from the 
fragments liberated . 

A clone encoding TANGO 215 and four other clones were 
deposited on September 25, 1998 with the American Type 
Culture Collection under accession number ATCC 98899, 

25 from which the srrain comprising the TANGO 215 cDNA clone 
is obtainable. To distinguish the strains and isolate a 
strain harboring the TANGO 215 cDNA clone, one can first 
streak out an aliquot of the mixture to single colonies 
on nutrient medium (e.g., LB plates) supplemented with 

30 100/ig/ml ampicillin, grow single colonies, and then 

extract the plasmid DNA using a standard minipreparation 
procedure. Next, one can digest a sample of the DNA 
minipreparation with a combination of the restriction 
enzymes Sal I and Not I and resolve the resultant 
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products on a 0.8% agarose gel using standard DNA 
electrophoresis conditions. 

The digest will liberate one vector fragment of 2.7 kb 
common to all strains, and one insert -specif ic fragment 
5 as follows: 

TANGO 215 (EpT215) 2 . 8 kb 

The identity of the strain harboring the TANGO 215 cDNA 
clone can be inferred from the fragments liberated. 

Equivalents 

10 The contents of all references, patents and published 
patent applications cited throughout this application are 
hereby incorporated by reference. Those skilled in the 
art will recognize, or be able to ascertain using no more 
than routine experimentation, many equivalents to the 

15 specific embodiments of the invention described herein. 
Such equivalents are intended to be encompassed by the 
following claims. 

What is claimed is: 
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1. An isolated nucleic acid molecule selected from 
the group consisting of: 

a) a nucleic acid molecule comprising a nucleotide 
sequence which is at least 55% identical to the 

5 nucleotide sequence of any of SEQ ID NOs:l-22, 34-43, and 

- , the cDNA insert of a plasmid deposited with 

the ATCC as any of Accession Numbers 98899, 98900, and 
98901, or a complement thereof; 

b) a nucleic acid molecule comprising a fragment of 
10 at least 300 nucleotides of the nucleotide sequence of 

any of SEQ ID NOs:l-22, 34-43, and - , the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; 

15 c) a nucleic acid molecule which encodes a 

polypeptide comprising the amino acid sequence of any of 

SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 

20 98899, 98900, and 98901; 

d) a nucleic acid molecule which encodes a fragment 
of a polypeptide comprising the amino acid sequence of 

any of SEQ ID NOs:23-33, 54-63, and - wherein the 

fragment comprises at least 15 contiguous amino acids of 

25 any of SEQ ID NOs:23-33, 54-63, and - or the 

polypeptide encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

e) a nucleic acid molecule which encodes a naturally 
3 0 occurring allelic variant of a polypeptide comprising the 

amino acid sequence of any of SEQ ID NOs:23-33, 54-63, 

and _ - or an amino acid sequence encoded by the 

cDNA insert of a plasmid deposited with ATCC as any of 
Accession Numbers 98899, 98900, and 98901, wherein the 
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nucleic acid molecule hybridizes to a nucleic acid 
molecule comprising any of SEQ ID Nos:l-22, 34-43, and 

- or a complement thereof under stringent 

conditions. 

2. The isolated nucleic acid molecule of claim 1, 
which is selected from the group consisting of: 

a) a nucleic acid molecule comprising the nucleotide 
sequence of any of SEQ ID NO: 1-22 and 34-43, the cDNA 
insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, or a 
complement thereof; and 

b) a nucleic acid molecule which encodes a 
polypeptide comprising the amino acid sequence of any of 

SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901. 

3 . The nucleic acid molecule of claim 1 further 
comprising vector nucleic acid sequences. 

20 4. The nucleic acid molecule of claim 1 further 

comprising nucleic acid sequences encoding a heterologous 
polypeptide . 

5. A host cell which contains the nucleic acid 
molecule of claim 1. 

25 6. The host cell of claim 5 which is a mammalian host 
cell. 

7. A non-human mammalian host cell containing the 
nucleic acid molecule of 
claim 1. 



10 
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8. An isolated polypeptide selected from the group 
consisting of: 

a) a fragment of a polypeptide comprising the amino 
acid sequence of any of SEQ ID Nos:23-33, 54-63, and _ - 

5 , wherein the fragment comprises at least 15 

contiguous amino acids of any of SEQ ID Nos: 23-33 and 54- 
63, and - ; 

b) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

10 SEQ ID Nos: 23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

15 nucleic acid molecule comprising any of SEQ ID Nos: 1-22, 

34-43, and - or a complement thereof under 

stringent conditions; and 

c) a polypeptide which is encoded by a nucleic acid 
molecule comprising a nucleotide sequence which is at 

20 least 55% identical to a nucleic acid comprising the 

nucleotide sequence of any of SEQ ID Nos : 1-22, 34-43, and 
- or a complement thereof. 



9. The isolated polypeptide of claim 8 comprising the 
amino acid sequence of any of SEQ ID Nos : 23 -33, 54-63, 

25 and - or an amino acid sequence encoded by the 

cDNA insert of a plasmid deposited with the ATCC as any 
of Accession Numbers 98899, 98900, and 98901. 

10. The polypeptide of claim 8 further comprising 
heterologous amino acid sequences. 

30 11. An antibody which selectively binds to a 
polypeptide of claim 8. 



k 
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12. A method for producing a polypeptide selected from 
the group consisting of: 

a) a polypeptide comprising the amino acid sequence 
of any of SEQ ID 1*03:23-33, 54-63, and - or an 

5 amino acid sequence encoded by the cDNA insert of a 
plasmid deposited with the ATCC as any of Accession 
Numbers 98899, 98900, and 98901; 

b) a polypeptide comprising a fragment of the amino 
acid sequence of any of SEQ ID Nos:23-33, 54-63, and 

10 - or an amino acid sequence encoded by the cDNA 

insert of a plasmid deposited with the ATCC as any of 
Accession Numbers 98899, 98900, and 98901, wherein the 
fragment comprises at least 15 contiguous amino acids of 
any of SEQ ID Nos:23-33, 54-63, and - or an amino 

15 acid sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901; and 

c) a naturally occurring allelic variant of a 
polypeptide comprising the amino acid sequence of any of 

20 SEQ ID Nos:23-33, 54-63, and - or an amino acid 

sequence encoded by the cDNA insert of a plasmid 
deposited with the ATCC as any of Accession Numbers 
98899, 98900, and 98901, wherein the polypeptide is 
encoded by a nucleic acid molecule which hybridizes to a 

25 nucleic acid molecule comprising the nucleotide sequence 

of any of SEQ ID Nos:l-22, 54-63, and - or a 

complement thereof under stringent conditions; 

comprising culturing the host cell of claim 5 under 
conditions in which the nucleic acid molecule is 

30 expressed. 



13. A method for detecting the presence of a 
polypeptide of claim 8 in a sample, comprising: 
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a) contacting the sample with a compound which 
selectively binds to a polypeptide of claim 8; and 

b) determining whether the compound binds to the 
polypeptide in the sample. 

5 14 . The method of claim 13 , wherein the compound which 
binds to the polypeptide is an antibody. 

15. A kit comprising a compound which selectively 
binds to a polypeptide of claim 8 and instructions for 
use. 

10 16. A method for detecting the presence of a nucleic 
acid molecule of claim 1 in a sample, comprising the 
steps of : 

a) contacting the sample with a nucleic acid probe or 
primer which selectively hybridizes to the nucleic acid 

15 molecule; and 

b) determining whether the nucleic acid probe or 
primer binds to a nucleic acid molecule in the sample. 

17. The method of claim 16, wherein the sample 
comprises mRNA molecules and is contacted with a nucleic 

2 0 acid probe. 

18. A kit comprising a compound which selectively 
hybridizes to a nucleic acid molecule of claim 1 and 
instructions for use. 

19. A method for identifying a compound which binds to 
25 a polypeptide of claim 8 comprising the steps of: 

a) contacting a polypeptide, or a cell expressing a 
polypeptide of claim 8 with a test compound; and 

b) determining whether the polypeptide binds to the 
test compound. 
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20. The method of claim 19, wherein the binding of the 
test compound to the polypeptide is detected by a method 
selected from the group consisting of: 

a) detection of binding by direct detecting of the 

5 binding of the test compound to the polypeptide binding; 
and 

b) detection of binding using a competition binding 
assay. 

21. A method for modulating the activity of a 
10 polypeptide of claim 8 comprising contacting a 

polypeptide or a cell expressing a polypeptide of claim 8 
with a compound which binds to the polypeptide in a 
sufficient concentration to modulate the activity of the 
polypeptide . 

15 22. A method for identifying a compound which 
modulates the activity of a polypeptide of claim 8, 
comprising: 

a) contacting a polypeptide of claim 8 with a test 
compound ; and 

20 b) determining the effect of the test compound on the 
activity of the polypeptide to thereby identify a 
compound which modulates the activity of the polypeptide. 
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GTCGACCCACGCGTCCGCGTGGATATGGAGCTGGCTGCTGCCAAGTCCGGGGCCCGCGCCGCTGCCTAGCGCGTCCTGG 79 

MALL 4 

GGACTCTGTGGGGACGCGCCCCGCGCCGCGGCTCCCGGACCCGTACAGCCCGCCGCTGCGCGC ATG GCC CTG CTC 154 

SRPALTLLLLLMAAVVRCQE 24 
TCG CGC CCC GCG CTC ACC CTC CTG CTC CTC CTC ATG GCC GCT GTT GTC AGG TGC CAG GAG 214 

OAQTTDWRATLKTIRNGVHK 44 
CAG GCC CAG ACC ACC GAC TGG AGA GCC ACC CTG AAG ACC ATC CGG AAC GGC GTT CAT AAG 274 

IDTYLNAALDLLGGE .DGLCQ 64 
ATA GAC ACG TAC CTG AAC GCC GCC TTG GAC CTC CTG GGA GGC GAG GAC GGT CTC TGC CAG 334 

Y KCS DGS KPF PRYGYKPS P P 84 
TAT AAA TGC AGT GAC GGA TCT AAG CCT TTC CCA CGT TAT GGT TAT AAA CCC TCC CCA CCG 394 

MGCGSPLFGVHLNIGI PSLT104 
AAT GGA TGT GGC TCT CCA CTG TTT GGT GTT CAT CTT AAC ATT GGT ATC CCT TCC CTG ACA 4 54 

KCCNQHDRCYETCGKS KNDC 124 
AAG TGT TGC AAC CAA CAC GAC AGG TCC TAT GAG ACC TGT GGC AAA AGC AAG AAT GAC TGT 514 

DEEFOYCLSKICRDVQKTLG144 
GAT GAA GAA TTC CAG TAT TGC CTC TCC AAG ATC TGC CGA GAT GTA CAG AAA ACA CTA GGA 574 

LTQHVQACETTVELLFDSVI164 
CTA ACT CAG CAT GTT CAG GCA TGT GAA ACA ACA GTG GAG CTC TTG TTT GAC AGT GTT ATA 634 

H LGCKPYLDS QRAACRCHYE 184 
CAT TTA GGT TGT AAA CCA TAT CTG GAC AGC CAA CGA GCC GCA TGC AGG TGT CAT TAT GAA 694 

E K T D L 190 
GAA AAA ACT GAT CTT TAA 712 

AGGAGATGCCGACAGCTAGTGACAGATGAAGATCCAACAACATACCTTTCACAAATAACTAATGTTTTTACAACATAAA 791 

ACTGTCTTATTTTTGTGAAAGGATTATTTTGAGACCTTAAAATAATTTATATCTTGATGTTAAAACCTCAAAGCAAAAA 8 70 

AAGTG AGGGAG ATAGTC AGGGGACCGCACCCTTCTCTTCTCAGGTATCTTCCCCACCATTGCTCCCTTACTTAGTATCC 94 9 

CAAATGTCrTGACCAATATCAAAAACAAGTCCTTGTTT^ 1028 

ACAACCACATTTACCAAAAAAAGAGATCAAATATAAAATTCATCATAATGTCrcTTCAACATTATCTTATTTGGAAAA^ 1107 

GGGGAAATTATCACTTACAAGTATTTGTTTACTATGAAATTTTAAATACA 1186 

AAAAAAACCGCCCCCCC 1 20 J 
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GTCGACCCACGCGTCCGGGCCGGGGTCCTGAGCCGGAGCCGGAGCGCGCGCCGCTGCCCAGCCCCGCCGCGCCGGCCCC 79 

MVTPRPAPARGPALLLLL 18 
GCAG A7G GTG ACT CCG CGG CCC GCG CCC GCC CGG GGC CCC GCG CTC CTC CTC CTC CTG 137 

LLATARGQEQDQTTDWRATL 38 
CTG CTG GCC ACT GCG CGC GGG CAG GAA CAG GAC CAG ACC ACC GAC TGG AGG GCC ACC CTC 197 

KTIRNGIHKIDTYLNAALDL 58 
AAG ACC ATC CGC AAC GGC ATC CAC AAG ATA GAC ACG TAC CTC AAC GCC GCG CTG GAC CTG 25 7 

LGGEDGLCQYKCSDGSKPVP 78 
CTG GGC GGG GAG GAC GGG CTC TGC CAG TAC AAG TGC AGC GAC GGA TCG AAG CCT GTT CCA 317 

RYGYKPSPPNGCGSPLFGVH 98 
CGC TAT GGA TAT AAA CCA TCT CCA CCA AAT GGC TGT GGC TCT CCA CTG TTT GGC GTT CAT 377 

LNIG I PSLTKCCNQHDRCYE 118 
CTG AAC ATA GGT ATC CCT TCC CTG ACC AAG TGC TGC AAC CAG CAC GAC AGA TGC TAT GAG 43 7 

TCGK SKNDCDEE FQ YCLSKI 138 
ACC TGC GGG AAA AGC AAG AAC GAC TGT GAC GAG GAG TTC CAG TAC TGC CTC TCC AAG ATC 4 97 

CRDVQKTLGLSQNVQACETT 158 
TGC AGA GAC GTG CAG AAG ACG CTC GGA CTA TCT CAG AAC GTC CAG GCA TGT GAG ACA ACG 557 

VELL FDSVIHLGCKPYLDSQ 178 
GTG GAG CTC CTC TTT GAC AGC GTC ATC CAT TTA GGC TGC AAG CCA TAC CTG GAC AGC CAG 617 

RAACWCRYEEKTDL * 193 
CGG CCT GCA TGC TGG TGT CGT TAT GAA GAA AAA ACA GAT CTA TAA 662 

AGACCCTGACTGCTCGAGACCACGCGAGAATGGAGGATCATCCTTGCCAAAGATCCGATGCTTTAACAGCCTAATGTTG 741 

CCTTAGTTTTGTGTCGATGCCTCATTTTGACACCTTTCTATACTG 820 

GGGGGCCAGGCAGAAACAGAGGGAGAGCATGCTTCCGATGCGGAGCCAGCAGGACATCCAAGAGCATGCCTTCCTCAG A 8 99 

CTCCCTCTC7TCGTCGCTCCCCCAAACTGGG AAGAAAAGCTTAAGCTCGTGTCACTTGGTGTTCATAGTTGTACTTAAC 97 8 

AATAAAAATGAAAGCAAATGTAAAATTCATTGTAACGACTTTTCACCATTATTTTATTTTC 1057 

CCTTAGAACTATTATTTATTTTGAAATTTCAGATGTACATTTATACCTGGAAAAACTATTAATTCTCCATTTTTATTAT 1136 



ACATAATGTGTTCTTTCTCTGAAGCCCACTAAC ATACCTATAAATATGTTACTCAAAACTACACCCTTTCCAAATCTGC 1215 
ATCrCTTGTACAGTTCGAATCACGGTTGGTACTTCTCTCGAGACACGCCCCACCACATCTGAGTGTTCGGATGTCCACA 12 94 
GAATTCAGAACCCCAGCTTCCTGTCTCACAAACCCCTrAGAGTGAATGTCCTTCCTCTCCTGCTGTGAGCTCTAGGAAT 13 73 
OACCGGTTT^CCCGCCAAGCCCAGCTCTGAATCAGTGCGCTATCTCCTCCTCAGGTTGTGGTTACTCCCTCATCCCCG 14 52 
TTTTCCATCTTCTATCCTCCAGTAGTGTTAAAAGTCTGACATTTTCTAATGGAGCTCTTAATAAAAGCTATTTACTTCT 1531 



raCTAAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCG 1570 
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ACCACCCGTCCGCCCACGCGTCCGGTCGCGTGCTGAGGGGTGTGACGGTTTTCTTGCTCGTGGGCTCGGACGAGTACGG 79 

MAQLGAVVAV 10 
AGCGCCTGCAGGG ACAG CCTGGATAAAGG CTCACTG ATG GCT CAG TTG GGA GCA GTT GTG GCT GTG 145 

ASS FFCASLFSAVHKIEEGH 30 
GCT TCC AGT TTC TTT TGT GCA TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT 205 

IGVYYRGGA LLTSTSGPGFH SO 
ATT GGG GTA TAT TAC AGA GGC GGT GCC CTG CTG ACT TCG ACC AGC GGC CCT GGT TTC CAT 265 

LMLPF ITSYKSVQTTLQTDE 70 
CTC ATG CTC CCT TTC ATC ACA TCA TAT AAG TCT GTG CAG ACC ACA CTC CAG ACA GAT GAG 325 

VKNVPCGTSGGVMI YFDRX E 90 
GTG AAG AAT GTA CCT TGT GGG ACT AGT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA 385 

VVNFLVPNAVYDIVKNYTAD 110 
GTG GTG AAC TTC CTG GTC CCG AAC GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCT GAC 445 

YDKALI FNKIHHELNQFCSV 130 
TAT GAC AAG GCC CTC ATC TTC AAC AAG ATC CAC CAC GAA CTG AAC CAG TTC TGC AGT GTG 505 

HTLQEVYIELFDQI D B N L K L 150 
CAC ACC CTT CAA GAG GTC TAC ATT GAG CTG TTT GAT CAG ATT GAT GAA AAT CTC AAA CTG 565 

ALOQDLTSMAPGLVIQAVRV 170 
GCT TTG CAA CAG GAC CTG ACC TCC ATG GCC CCT GGG CTG GTC ATT CAA GCT GTG CGG GTA 625 

TKPNI PEAIRRNYELMESEK 190 
ACA AAG CCC AAC ATA CCA GAG GCA ATC CGC AGA AAC TAC GAG TTG ATG GAA AGT GAG AAG 685 

TKLL IAAQKQKVVEKEAETE 210 
ACA AAG CTT CTC ATT GCC GCC CAG AAA CAG AAG GTG GTG GAA AAG GAA GCA GAG ACA GAG 74 5 

RKKALI EAEKVAQVAEITYG 230 
CGG AAG AAG GCG CTC ATT GAG GCA GAA AAA GTG CCC CAG GTG GCT GAG ATC ACC TAC GGG 805 

QKVMEKETEKKISE I EDAAF 250 
CAG AAG GTG ATC GAG AAG GAG ACT GAG AAG AAG ATT TCA GAA ATT CAA GAT GCT GCA TTT 865 

LAREKAKADAECYTAMKIAE 270 
CTG GCC CCG GAG AAG GCA AAG GCA GAT GCT GAG TGC TAC ACT CCT ATG AAA ATA CCC GAA 925 

ANKLKLTPEYLQLMKYKAI A 290 
GCC AAT AAC CTG AAG CTA ACC CCT CAA TAT CTG CAG CTG ATG AAG TAC AAG GCC ATT GCT 985 

SNSKIYFGKOI PNMFMDSAG 310 
TCC AAC AGC AAG ATT TAC TTT CGC AAA GAC ATT CCT AAC ATG TTC ATG GAC TCT GCG GCC 104 5 

SVSKQFEGE.ADKLSFGLEDE 330 
AGT GTG AGC AAG CAG TTT GAG GCG CTA GCT CAC AAG CTA AGC TTT CGC TTA GAA CAT GAA 1105 

PLSTATKEN* 340 
CCC TTG GAC ACC GCC ACT AAG CAC AAT TGA 1135 
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AAAAAACTTGATATGACTGCAAATGATACTTAAGOVGATOT 1214 
G A CT AC CTT CT CT G ACTG T CTTC CAG TT ACTGTG GTGAAAAAG AAG AAAT G AA CT T AAATC CACTCC CTTTCT AGGGAA 1293 
AGGAGGGTGGGGACTGATGATGGGGGGTTTTATTTCAGGTAAGCAGTTTATATGACTTCCAATAAGATTTGTAAATCAT . 13 72 
GGGCTTGACCTTTG A CCTCTAGACACTAATTTTAT CCTTTG AGGCTGGCTTAATTAGGGATG CTGTCATTAAGGAGAGG 1 4 S 1 
GAGAAATGTAGAGTGTTACCTCCAACTCATTTG ATTTCCCTTACTTGGGAAAATGCAGTCCAGTGTTCTCACCTCTGCC 1530 
TCCAAGGTAGGAGATGTCTGTGGGTGAGGCTC\GCAACTGAGCAAATATGTGCCTGTGAGTTTGCCAGTAGAGCTGTGA 1609 
AGAAACAGCTGCAGAGAACATTTGACCTTCCTGGCATTCTTGTCTGCATGTGTGTGAGTTATTTTAGAGGTGTGCTTTC 1688 
TTGAGCCCTCATAAGGAAGTACTGGTGCTAGGTTTTGCAAGATTTTGTATACACTTTGCTCCTTGCCCT 1767 
GTGGTGGTTTCTGACTACATTTCTAGAGTCAGAGCTTGATCACCACAACTCAATTATT^ 184 6 

TGTCATTTGTTTTTTTTTTT7CT7CTCAAAAATTCTGTTCATTGGTTCCACTCAGCATCAAGAAGA 1925 
CTCAAGTGTCTTAACAGCTGCTGGAGTGGGATCCTTGTTATCTCTTAGCCACTGCAGGACCTGCCTGACAGGTTATGTG 2004 
TGCACCTCGAGATGAAGTGTCTTTCTATTATTGTAGAGATTCTGTAGTGAAGAGGTCTGACACCATGTGTGGAGGAGGA 2083 
GG AACG AT CAGTCAAG AGATGTCCTG GT CTT AATGC CTGTGGCTTGTG CTGGGAGTGGGTCTGACTT AGTG ATAAAAGG 2162 
ACTCTATTCACTAAGTAG CCTGTG TTTTTAAATCCAGGG CTG CAGGCAGCAACGCAAGTCAGGCTGAACATTCAGTCTC 2241 
CAGAGACAGCTGTGTGGAGCAAATCAGAGTTCATGCCCAAGTCCCCAGGTTGGAATGGCTGTGCC^ 2320 
GGGTTTTCTTTTTCATTACTAGGTCAGAACATTTTGAGTCACCrTGGGAGATTCAGGATGGGGAGAGCAAATTTGAACA 2399 
AAAGGTTTTTCTTATATCCTG AGATTGAGGGGTAGGGGGTGTCCAACCTGTATAGCCCATGGGTTGTGTCTAGAATTAA 2478 
GTGGAGGGCAGCTATCTGGAGTTAACTTGCAAGCATATTGGTGCCCTCCATGACCACCTCTGGCTTAGGACTTGGCCCT 2557 
GTTATGAGCTGACCCCCACCCCCCACCCCCCACCCCCCCCCCCGCCAACTCCTATACCTATCTTCCCTAGGTGAATCTG 2636 
TG AATGGTCCTTT CTCC CAG CAATCCCTG CCTTCTTTTTGGGCCCATGCC CAGACTTCTGGTTT AAGG AATGGTCCCAG 2715 
AG CTTGG G C C ACCTTG CT CAG AAGTTTTCCG AG C ATTGAGC CTGCCT AG AAAG AT ACAG TGTT AG CT CCC CTTACTTCA 2794 
AAGTTCCCCTTCTCTGTTCWGACTCCTGGCACTTCTCGTCCTGGGCACACTTTTTGCACGCAACAAAATGTCCCTGGG A 2873 
GTGATGCATTTTAATGTGCTCCACAGTCCn'TTCAGAAGGTGGTCATTTCCCTTCGCCGGGCGCGGTGGCTCACACCTGT 2952 
AATCCCACCACTTTGGGAGGCCAAGGCAGGCGGATCACCTGAGGTTAGGAGTTCGACACCACCCTGGCCAACATGCGAA 3031 
ACCCCATCTCTACGAAAAATAGAAATATTAGCCGGGCATGGTGTCAGGCACCTGTAATCCCAGCTACTTGGCAGGCTGA 3110 
GGCAGGAGAATTGCTTGAACTCGGGACGCAGAGGTTCCAGTGAGCCAAGATCATGCCATCCCACTCTAGCTTGGGCAAT 3189 
AGAGCAAGGCTCCCTCTCAAGAAAAGAAGGTCATTTCCCAAGACTAGCATAGGGAGTATCCATTTAAAATACATTCATC 3 268 
TTCCTCCCATTTCCGTGCTATTAATCACTTCTTAGAGCAACATGACAATGCCCAGCATCCCACTTCCCGAAAATGTCTA 3 347 
CTCCTTCTACTCTGACCTCTTGTTGCCTACACCTCAGAAAACACCAATTCACCACAGTAGAACCGCGAGCAGGGATAGC 34 26 
TCAGCTTCTCTCAATACCACACTTTGCTCAGGTCTTAACTTGACGCCCTCTCCGGTACTAACATCCTGCCATAGCTTGT 3 505 
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CCCATGAGCACAGAAGAGCCTCAGTAGAGTCAACTCCTCCTCCACCTCCCCCACCCCAAGTC 3584 

AAACAAAAAATATGTTATCCTACACATTAGTGTCAATCCAATGGTTGTCTCTTATCTGCTAAATAGCAAAATC^ 3663 

ATCAGCTGTTTTATTTGCATAGGCAACTAACCTGTCTGTGTAACT 3 742 

TCTTAAAACATTTGAATTCTAAACATGTAAAA7GTGACAGCCTGCAATTTTGTAGACAGTGAAGTAATGGCT 3821 

ATAAACAGTTACTTATTTTGATAGATGTTCCATTTATCAAAATAAGTA^ 3 900 

TTCCAAGGAAAAATCACCTTGGTTG AATGTTTCTCACTCATTAAACTTTGCAGAAGTGATTCATATTCAGTACTGTTTT 3979 

TAATCACTTTTTAAAATATAAGGACCCAATGCAAGGAAACCA^ 4058 

AGATGTGGAGGGATCTGTGATCATATAAAAAGGGAGGGTTACTGAAAGAATTTTAGCAATATATTGATTCAGGAAAAGG 4137 

AGCTGTTTTATAAATGATCATTCACTGTTCCTATGGTTCTATGTATCTTTCAAACCGATACCTTTACTATTTAAAGAGC 4216 

GTAAATAGTGAAAGTAAGATGGTCATACTTACTGACTTTATCTATTTAAGTTTGATGGAGATAAACTATATCTTGGCTA 4295 

GTGGCTACTGTGTCTGTGAATGTAACCAGTACTTCTTTAAGCTCTATTCAGTAGGGTTCCAGCCACTGCTTTTTTGTTG 43 74 

TTTCTAGCCACTGTTTTTTTTTTCTTGTTTCCTTATAA^ 4 451 
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GTCGACCCACGCGTCCGCGGACGCGTGGGCGCGGACTGATGGCGTCATCGAAGCGACTGGCCCGGAAGGAAGTAGGGTG 79 

CTGAGGGGTGTGGCGG7TTCTACGGTTGCACGGGGGTTCGGCTGTGTACGGAGCGCCTGGAGGGACAGCCTGGATACAG 158 

MAQLGAVVAVASS F F C A 17 
GTTCACTG ATG GCT CAG TTG GGA GCT GTT GTG GCC GTG GCT TCC AGT TTC TTT TGT GCA 217 

SLFSAVHKIEEGHIGVYYRG 37 
TCT CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT ATT GGA GTA TAT TAC AGA GGT 277 

GALLTSTSGPGFHLMLPFIT .57 
GGT GCC CTG CTG ACC TCC ACC AGT GGC CCG GGT TTC CAT CTC ATG CTC CCG TTC ATC ACA 337 

SYKSVQTTLQTDEVKNVPCG 77 
TCC TAT AAG TCT GTA CAG ACC ACT CTC CAA ACT GAT GAA GTG AAG AAC GTA CCA TGT GGA 397 

TSGGVMIYFDRIEVVNFLVP 97 
ACC AGT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA GTG GTG AAC TTC CTG GTC CCA 457 

NAVY DI VKNYTADYDKALI F 117 
AAT GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCA GAC TAT GAC AAG GCC CTC ATC TTC 517 

NKIHHELNQFCSVHTLQEVY 137. 
AAC AAG ATC CAT CAT GAG CTT AAC CAG TTC TGC AGC GTT CAT ACT CTT CAG GAA GTC TAT 577 

I ELF DQ I D EMLKLALQQDLT157 
ATC GAG CTG TTT GAT CAA ATT GAT GAA AAC CTC AAG TTG GCT TTG CAG CAG GAC CTG ACT 637 

SMAP GLVIQAVRVTKPNIPE 177 
TCC ATG GCC CCT GGG CTG GTT ATC CAA GCT GTG CGA GTG ACA AAG CCC AAT ATA CCT GAG 697 

AIRRNYELMESEKTKLLIAA 197 
GCA ATC CGC AGG AAC TAT GAG CTG ATG GAA AGC GAG AAG ACG AAG CTT CTC ATT GCA GCC 757 

QKQKVVGKEAETERKKALI E 217 
CAG AAG CAG AAG GTG GTG GAA AAG GAG GCA GAA ACA GAG AGG AAG AAG GCC CTC ATT CAG 817 

AEKVAQVAEIT .YGQKVMEKE 237 
GCA GAA AAA GTG GCA CAG GTT CCA GAA ATC ACC TAT GGG CAA AAG GTG ATG GAG AAC GAG 877 

T E K 
ACA GAG AAG • 
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MNMTQARV 8 
GTCGACCCACGCGTCCGGCGGCTGGGCTTCTTCTCAGAGGAACGAGA ATG AAT ATG ACT CAA GCC CGG GTT 71 

LVAAVVGLVAVLLYASIHKI 28 
CTG GTG GCT GCA GTG GTG GGG TTG GTG GCT GTC CTG CTC TAC GCC TCC ATC CAC AAG ATT 131 

EEGHLAVYYRGGALLTS PSG 48 
GAG GAG GGC CAT CTG GCT GTG TAC TAC AGG GGA GGA GCT. TTA CTA ACT AGC CCC AGT GGA 191 

PGYH IMLPFITTFRSVQTTt, 68 
CCA GGC TAT CAT ATC ATG TTG CCT TTC ATT ACT ACG TTC AGA TCT GTG CAG ACA ACA CTA -251 

QTDEVKNVPCGTSGGVMIYI 88 
CAA ACT GAT GAA GTT AAA AAT GTG CCT TGT GGA ACA AGT GGT GGG GTC ATG ATC TAT ATT 311 

DRI EV .VNMLAPY AVFDIVRN 108 
GAC CGA ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAT ATC GTG AGG AAC 3 71 

YTAD YDKTLI F N K I HHELNQ 128 
TAT ACT GCA GAT TAT GAC AAG ACC TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG 431 

FCSAHTLQEVYIELFDQIDE 148 
TTC TGC AGT GCC CAC ACA CTT CAG GAA GTT TAC ATT GAA TTG TTT GAT CAA ATA GAT GAA 4 91 

NLKQALQKDLNLMAPGLTIQ 168 
AAC CTG AAG CAA GCT CTG CAG AAA GAC TTA AAC CTC ATG GCC CCA GGT CTC ACT ATA CAG 551 

AVRVTKPKI PEAIRRNFELM 188 
GCT GTG CGT GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAG TTA ATG 611 

EAE KTKLL IAAQKQKVVEKE 208 
GAG GCT GAG AAG ACA AAA CTC CTT ATA GCT GCA CAG AAA CAA AAG GTT GTG GAA AAA GAA 671 

AETERKKAVI EAEKIAQVAK 228 
GCT GAG ACA GAG AGG AAA AAG GCA GTT ATA GAA GCA GAG AAG ATT CCA CAA GTC GCA AAA 731 

I R FQQKVMEKETEKR I SEI E 248 
ATT CGG TTT CAG CAG AAA GTG ATG GAA AAA GAA ACT GAA AAG CCC ATT TCT GAA ATC GAA 791 

DAAFLAREKAKADAEYYAAH 268 
GAT GCT GCA TTC CTG CCC CGA GAG AAA GCG AAA GCA GAT GCT CAA TAT TAT GCT CCA CAC 851 

KYATSNKHKL TP EYLELKKY 288 
AAA TAT GCC ACC TCA AAC AAG CAC AAG TTG ACC CCG GAA TAT CTG GAG CTC AAA AAG TAC 911 

QAIASNSKIYFGSNI P N M F V 308 
CAG GCC ATT GCT TCT AAC AGT AAG ATC TAT TTT GCC ACC AAC ATC CCT AAC ATG TTC GTG 971 

DSSCALKYSD I RTGRESSLP 328 
CAC TCC TCA TGT GCT TTG AAA TAT TCA GAT ATT ACG ACT CGA ACA CAA ACC TCA CTC CCC 10 31 

SKEALEPSGENVIOWK'ESTG 348 
TCT AAG GAG CCT CTT CAA CCC TCT GCA GAG AAC GTC ATC CAA AAC AAA GAG AGC ACA CCT 1091 



TGA 



349 
100A 
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TGC^GAGGTGGAAATGTTCTCC^TATCAAGATGTCGCCCAAGGG 1173 
GATTTACAGAGAACTTACACTTCATCTCTTCCACCTC^^ 1252 
CCAGCTGTCTGACACACAAATGGTCTTTTCAGCCACAGTCTTATCAAGTATCCTATATGTATT 1331 
CTCATGAATGAGGAAAGTCTGATGCTAAGATACTGCCTGCATTCCCTGCATTGGGTTGATGACTGTCAGCATCACTGCC 1410 
GCAGGCCATGCTTGACTAAGGTACCTGGTTTTAGCCACAGCCACCTCCTTGTATGTTACCTTTCAGCTCTGGCCAAGAG 1489 
TGGGACAGGGTTTTAACCACAAATAGGAGCAGCATGCAATTCCTAGTGACTTGCTGCACAGTATTGTATCATAATTACA 1568 
G GAAGTTTTTATTTTTAAAACTGG ATCTGGGGTATATTCATTTG CCCCATCACCTCTGTCTAAAGG CCCAAG TCCTAGG 1647 
GCTG CCATGGTCACAAGCACACTGATGCTCCTTAAGATTGTTTAT CTGG AG CCCACATAGTGTGGAACAAAAAGTCACC 1726 
TAGAAAGCATCCITGGTCATCATTGTCrCCTTCCCAC^^ 1805 
CACCTCCCCCAGGAGATCAGGATTCCACTGACGTCCTGGGCAGCCAGTGAATTTA^ 1884 
TAACCTGTGGCATT AGGAG AC CTACTTCATG TGGACCCTTTTTTTCCTTCAGTTTAACTTTTCTGG AGCAGTGTG CTG C 1963 
GTAGTTCGGCCTGAGTTTGTGCAGCTTGTTAAGACAACTCTTGTGTACACTATGTTGAAGCTCAACAAAAAAGTCATGG 2042 
GACCACTTCTAGAAATCTTTCAGCTGTCAGGCCTGTCAGTCTCATGACAGTTTGTTGGTTGTGCCAAACACTTTATTTG 2121 
GGAAAGGAAAGCCCAGATTTGAATGGGTCTTTCCCCTGGGCCTTATCCTATAGAGGCATTTGTAATATGGAGAAAATAA 2200 
TTTTTCATTTTTGCTCATTTAATTCTATAAATTCTCTTTATAAATGAATTTTGTGTTCTTTAGTTCTCCTTAAAA 2279 
TTTTG AATTATAAAAATAAAATCTTTACCTGTCG AATTGTTG CTG CAGATG ATTGTTGTGGAAAATCTGG ATCATTG AC 2358 
CTCTGTGCTTTCATTCCTAG AG ATGTTTTATAG TTACATG AGCAAAAG CTGTTGCCCCAAAGTG ATGG CCCTGGAGG CG 2437 
GGGCTGAGGAACAGGGAAATGCCGCTGTGAAGTCTTAAAGCACTTCTGCTTAAACTCCATGTGTGAGGAGTGTGCCTCC 2516 
CTGTGCCCTCTCAGCTCTGAGGCTGGCCGTCTTTCGGGGTGTTCCTTTTGGCAAATATACACTGTAATCTTGAGTCT^ 2595 
ATTTATATGTTG AAATGCTACCTTTTTTAAAATAAG AAACTAAATAAAATTATTTTACTATCAAAAAAAAAAAAAAAAA 2674 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2704 
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M I Y I D R I 7 
GTCGACCCACGCGTCCGTAAAAATGTGCCTTCTGGAACAAGTGGTGGAGTC ATG ATC TAT ATT GAC CGA ATA 72 

EVVNMLAPYAV FDI VRNYTA 27 
GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAC ATT GTG AGG AAC TAT ACT GCA 132 

DYDKTLIFNKIHHELNQFCS 47 
GAC TAC GAC AAG ACT TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG TTT TGC AGT 192 

AHT LQEVYIELFDQIDENLK '67 
GCC CAC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA AAC CTG AAG 252 

OALQKDLNTMAPGLTIQAVR 87 
CAG GCC CTG CAA AAA GAT TTA AAC ACC ATG GCC CCA GGT CTC ACT ATC CAG GCT GTG CGT 312 

VTKPKIPEAIRRNFELMEAB107 
GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAA TTA ATG GAG GCA GAG 3 72 

KTKLLIAAQKQKVVEKBa' ET127 
AAG ACA AAA CTT CTC ATA GCT GCA CAG AAA CAA AAG GTG GTG GAG AAA GAA GCT GAG ACG 432 

ERKRAVIEAEKIAQVAKIRF147 
GAG AGG AAA AGG GCT GTT ATA GAA GCA GAG AAG ATT GCA CAA GTA GCA AAA ATT CGA TTT 4 92 

00KVMEKETEKRISEIEDAA167 
CAA CAG AAA GTG ATG GAG AAA GAA ACT GAA AAA CGC ATT TCT GAG ATT GAA GAT GCT GCG 552 

FLAREKAKADAEYYAAHKYA187 
TTC CTG GCC CGA GAG AAG GCA AAA GCA GAT GCC GAG TAT TAC GCT GCA CAC AAA TAC GCC 612 

TSNKHKLTPEYLELKKYQAI207 
ACC TCA AAC AAG CAC AAA CTG ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC CAG GCC ATT 672 

ASNSKIYFGSNIPSMFVDSS227 
GCC TCA AAC AGT AAG ATC TAC TTT CCC AGC AAC ATC CCC AGC ATG TTT GTG GAC TCC TCC 73 2 

CALKYSDGRTGREDSLPPEE247 
TGT CCT CTG AAA TAC TCT GAT CGT AGG ACT GCG AGA GAA GAC TCC CTT CCC CCA GAG GAG 792 

AREPSGESP-rQNKENAG* 265 
GCC CGT GAG CCC TCT GGA GAG AGC CCC ATC CAA AAC AAG GAG AAC GCA GGT TCA 84 6 

TGCAAC ACCTGGAAATGTTCTCCCATATCAAC ATCCGACCCAAGGGGCTAAGTGGGAACAGTGGTTATGTGGACTCGTA 925 

AGATTCACAGAGAATGTGTGCTCTGTTGTGATTCTCTTGTCATAGTCCTGGTTTGCCAGCTGACTACAGGATAGACCCA 1004 

GCTGTCTGCCACTCAAACGGTCTCTGCAGCCACAGTTTTATCAAGTATCCTGTATGTGTTCCrrTTGTAAACCGGTACTC 1083 

ATCAATGAGGGAAAGTCTGATGCTAAGATACTGCCTGCACTGGAATGTCAAACACTATATAACAAGCTGTGGTTTTTAA 1162 

AAGCTATTGAATAATGTTTACATTGCTCCCTGAGGACATGTGTGCTCAGACATTCAAGAGCTAGGAGGCCAGAGAGAAG 1241 

ACCrrCAGAAAACGGTAAGTTAAAC AAGACAAGTGTCATCAGACACTTGGCACCCGGGCTCTCTTTAAAGTCTAGTCCC 1 3 20 

GCCATTCCTCCATGTGATTOACACCCAC ACCTCTCCGTTCCCACGAAATTATCTTCCAGTTCAATCACCATTTACTTGA 13 99 
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TACAAATTGTACCTTTCTGTTTTTCTAGTCAGGTTGGTGGCCT^ 1478 
CTCGAAGATATTCCCAATCACTAGTTTATTCCGTTAGGAGACT 1557 
AAAGCC7GCACTGCACCAAAGCTACGGGTCCCTGTGTTTCCTCTATTCAGTGATGTCATCAACCTCACTGTCCCAGCCC 1636 
ATGTGTGACTAAAGTGCCCGGTTTTAGCCACAGACAACTGCTTAGATGTCACCTCTTGGCTGACCAAAGCTGGGACAGG 1 71 S 
GCTTTAACCAGACATAGGAGCAGTGTGCAATTCCTGATTCACTGCACAGTATTATGTCATAATTGCAGGAATTA 1794 
TGTTTTTAAAACTGG ATTTGGGGCACATTCATTCACCCCAACACTTCTATCTAAAGGCCAAGGTTCTAGGGCTGCTATG 1873 
GTCACTAACACACTGATTCTCCTTAAAGTAATTCTCG AAGTGTGGAACAAAGTGACCGAGACAGCATCCTCAGTCATCT 1952 
TTGTCTCCTTCCCTGGGATGCAGATACCGAAGTTGCTTTTCCAACTTTCGCCTCCGCTAGGAGATCAGAAAGAATTCT^ 2031 
GTGACTTCCTGGGCAGCCATTGAATTCATTTTCCATGAGAAGATGACAGAGTTAGCCTGTGGCTATAGGAGATCATGTC 2110 
ATCCAGACCTTTTTGCCCATCACATTAACTTTCCTGGAATATTGTGCTGCACAGGTAGACCTGAATCTGCCCAGCTTGT 2189 
TGACAGCTCTTGTGTATACTGTGTTGAAGCCAGACAGAAAAGTAATGGGGCCACTTCTGAAACCTCTCAGCTGTTGATC 2268 
TCACAGCAGCTAAAGGGTTGTGCCAAACATTTTATTAAGAAAGTAAAGCCCAGATTTGAATGGGGGTTTTCCCTAGGCC 234 7 
TTAT AGTATAGAGGCATTTGTAATATGGAGAAAATAATTTTTCTCATTT AATTATAGAAATTACCTTCAAACAGATTTT 24 26 
GTGTTCTTTGGCCCTTCAAATACTGGTGTTACATTGTTGCTGCAGATAAATGATGATTGTCGTGGGATATCTGGATCAC 2505 
TGAGCTCTGTGCTTTCATTCCTAGAGATGTTTCTCATTCCCATTTAGTGAAATGCTGTTGCCCCAAAGTGATGGTTGTG 2584 
GGATTTCTT ACCGGTCATAGGCCCCGGTGAGGAGCAGGGAAGCGCCATTGTGAAAGATTAAAGAAAGCACTTCCACTTG 2663 
AGCTCCTTATGGAGTGAGCTTCCCTGTGCCCACTCAGTG AACTAAGTCTGACCATCCTTCAGCGACGTTCCTTTTGGTA 2742 
AATATACACTGTAATCTTTAAGTCTAAATTTATATGTGAAAGTTAACTTTrTTTAAAAACCTAAA 2821 
TATCAAAAAAAAAAAAAAAAAGGCCGGCCG 2851 
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GTCGACCCACGCGTCCGGCGGGGACAACTGGGTCITT^^ 79 



PPAEANKSSEDIRCKC ICPP 35 
CCC CCA GCT GAA GCC AAC AAG ACT TCT GAA GAT ATC CGG TGC AAA TGC ATC TGT CCA CCT 200 

YRNISGHIYNQNVSQKDCNC 55 
TAT AGA AAC ATC AGT GGG CAC ATT TAC AAC CAG AAT GTA TCC CAG AAG GAC TGC AAC TGC 260 

LHVVEPMPVPGHDVEAYCLL 75 
CTG CAC GTG GTG GAG CCC ATG CCA GTG CCT GGC CAT GAC GTG GAG GCC TAC TGC CTG CTG 320 

CECRYEERSTTTI KVI I V I Y 95 
TGC GAG TGC AGG TAC GAG GAG CGC AGC ACC ACC ACC ATC AAG GTC ATC ATT GTC ATC TAC 380 

LSVVGALLLYMAFLM LVDPL X15 
CTG TCC GTG GTG GGT GCC CTG TTG CTC TAC ATG GCC TTC CTG ATG CTG GTG GAC CCT CTG 44 0 

I RKPDAYTEQLHNE EENEDA 135 
ATC CGA AAG CCG GAT GCA TAC ACT GAG CAA CTG CAC AAT GAG GAG GAG AAT GAG GAT GCT 500 

RSMAAAAASLGGP RANTVLE 155 
CGC TCT ATG GCA GCA GCT GCT GCA TCC CTC GGG GGA CCC CGA GCA AAC ACA GTC CTG GAG 560 

RVEGAQQRWKLQVQEQRKTV 175 
CGT GTG GAA GGT GCC CAG CAG CGG TGG AAG CTG CAG GTG CAG GAG CAG CGG AAG ACA GTC 620 

FDRHKMLS * 184 
TTC GAT CGG CAC AAG ATG CTC AGC TAG 647 

ATGGGCTGG TGTG GTTGGGTCAAGG CCCCAACACC ATGGCTGC CAGCTTCCAGGCTGG ACAAAGC AGGGGGCTACTTCT 726 

CCCTTCCCTCGGTTCCAGTCTTCCCTTTAAAAGCCTGTGGCATTTTTCCT^ 805 

TTGGCTATTTTGATTAGGGAAGAGGCATGTGGTCrCTGATCTCCGTTGTCTTCTTGGGTCTTTGGGGTTGAAGGGAGGG 884 

CCAAGGCAGGCCAGAAGGGAATGGAGACATTCGAGGCCGCCTCAGGAGTCGATGCGATCTGTCTCTCCTGGCTCCACTC 9 6 3 

TTGCCGCCTTCCAGCTCTGAGTCTTCGGAATGTTGTTACCCT^ 104 2 

GGGAGGAAAGCATGGCCCAGCATTCAGCATCTGTTCCTTTCTGCAGTGGTTCTTTATCACCACCTCCCTCCCAGCCCCA 1121 

GCGCCTCAGCCCCAGCCCCAGCTCCACCCCTCAGGACACCTCTGATGGGAGAGCTCGGCCCCCTGAGCCCACTGGGTCT 1200 

TCAGGGTGCACTGGAAGCrGGTCTTCGCTGTCCCCTGTGCACTTCTCGCACTGGGGCATGGAGTGCCCATGCATACTCT 1279 

GCTGCCGGTCCCCTCACCTGCACTTGAGGGGTCTGGGCAGTCCCTCCTCTCCCCAGTGTCCACAGTCACTGAGCCAGAC 1358 

GGTCGGTTGGAACATGAGACTCCACCCTGAGCCTCGATCTGAACACCACAGCCCCTGTACTTCCGTTGCCTCTTGTCCC 14 37 

TGAACrTCGTTCTACCAGTCCATGCAGAGAAAy\TTTTGTCCTCTTGTCTTAGAGTTCTGTGTAAATCAAGGAAGCCATC 1516 



MKLLSLVAVVGCLLV 
AGCAAGCCTGATAAGC ATG AAG CTC TTA TCT TTG GTG GCT GTG GTC GGG TGT TTG CTG GTG 



15 
140 



ATTAAATTGTTTTATTTCTCAAAAAAAAAAAAAAAAAAAAGGGCCGCCG 1565 
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GTCGACCCACGCGTCCGGCCTGCTGATCAGTGGCCGCTGCGGCTGAGCTTGCAGGCATCTAGTCTTGCTGGCTCAGCAA 79 

MKLLCLVAVVGCLLVPP 17 
GCCCGATAAGC ATG AAG CTG CTG TGT TTG GTG GCT GTG GTG GGG TGC TTG CTG GTG CCC CCA 141 

AQANKSSEDIRCKCICPPYR 37 
GCT CAA GCC AAC AAG AGC TCT GAA GAT ATC CGG TGC AAA TGC ATC TGT CCG CCT TAC AGA 201 

NISGHI YNQNVSQKDCNCLH 57 
AAC ATC AGC GGG CAC ATT TAC AAC CAG AAT GTG TCT CAG AAG GAC TGC AAC TGC CTG CAT 261 

VVEPMPVPGHDVEA Y CLLCE 77 
GTG GTG GAG CCC ATG CCA GTG CCT GGC CAC GAT GTG GAA GCC TAC TGC CTG CTC TGC GAG 321 

CRYEERSTTTIKVIIVIYLS 97 
TGT AGG TAC GAG GAG CGT AGC ACC ACA ACC ATC AAG GTC ATT ATT GTC ATC TAC CTG TCT 381 

VVGALL LYMAFLMLVD PL I R 117 
GTG GTG GGG GCC CTC TTA CTC TAC ATG GCC TTC CTG ATG CTG GTG GAC CCG CTC ATC CGG 441 

KPDAYTEQLHNEEENEDART 137 
AAG CCA GAT GCC TAT ACT GAG CAG CTG CAC AAT GAA GAG GAG AAT GAG GAT GCT CGC ACC 501 

MATAAAS IGGPRANTVLERV 157 
ATG GCA ACA GCC GCT CCG TCC ATT CGA GGA CCC CGG GCA AAC ACT GTC CTG GAG CGG GTG 561 

EGAQQRWKLQVQEQRKTVFD 177 
GAA CGC GCT CAG CAG CGG TGG AAG CTG CAG GTG CAG GAG CAG CGG AAG ACG GTC TTC GAC 621 

R H K M L S * 184 
CGA CAC AAG ATG CTC AGT TAG 642 

ATGGTTGCCATGATTGCATCAGAG ACCTGGGCCATGGCTACCAGCTTCTGCGGCTCACTGCAGTCTTCCCTGGGTCTTC 721 
CCTTCAAATGCCCATCCCGTTTATCCTTCTCCCTCTCTAGAAATGTACTCGACTGTTATAACGAGGGAGTGTGATTGGG 800 



TCTCTGTACGTCTCTGGGCGGTAGAGGCGACCGGACGGAAGGCAGAAGGGAACAGAGACATTTCACGTGGCCACATGAT 879 
TGGGTGGAATTCATCCCTCCTGTCTTCACCATTCCTCCCAGCTCCACATCTTAAGGATGCTTACGGGAGACGAAGCTGT 958 
GTCATCAAGAGCTCAGTGGGTCCGAGGAAAGTATCATCCAGCGCTCAGCCTTCGCTCTAGGATGCTGTGGTCCCCATTC 1037 
CC AGTTCCTTC AGTG CC AGT ACTTT AACTTGG CCT ACCCCAGTCTCACG AACTGTTGTGGTGCCCCTG AG CCCACAGTC 1116 
ATCTCCAGAGTCCACCTCGAAGCCTGTTCCCCTCTCCTCGCCTCCTGGTCCACCAGTGCATGGCAGTGCCCATGCATGC 1 1 9 S 
CGGCATATTCAGCAGCTGTCACCTTACTCCCATCCCACG AGCCCCTAACGCCTCCCACCTCTCCCCTCTGACTCCACCT 1274 
GCTGAGCCATAAAGTTCGACCATATGACACAAGCCCAATGGGCACCGGAGTACCATGGCTCCTGTCCTTGGATGGTCTC 13 53 
TTGTCCCTGAATTTCATTGTATCATCCATCGAGACAAAAAAAAAAAAAAAAAAAAAAAAAA 14 3 2 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1510 
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GAATTCGGCACG AGGGG ATCCCCAGCCGGGTCCCAAGCCTGTGC CTGAGCCTGAG C CTGAGCCTGAGCCTGAGCCCG AG 79 

M A T h W G 6 
CCGGGAGCCGCTCGCGGGGGCTCCGGGCTGTGGGACCGCTGGGCCCCCAGCG ATG GCG ACC CTG TGG GGA 149 

GLLRLGSLLSLSCLALSVL'L 26 
GGC CTT CTT CGG CTT GGC TCC TTG CTC AGC CTG TCG TGC CTG GCG CTT TCC GTG CTG CTG 209 

LAQLSDAAKNFEDVRCKCIC 4$ 
CTG GCG CAG CTG TCA GAC GCC GCC AAG AAT TTC GAG GAT GTC AGA TGT AAA TGT ATC TGC 269 

P PYKENSGHIYNKNISQKDC 66 
CCT CCC TAT AAA GAA AAT TCT GGG CAT ATT TAT AAT AAG AAC ATA TCT CAG AAA GAT TGT 329 

DCLHVVEPMPVRGPDVEAYC 86 
GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTG CGG GGG CCT GAT GTA GAA GCA TAC TGT 389 

LRCECKYEERSSVT I K V T I I 106 
CTA CGC TGT GAA TGC AAA TAT GAA GAA AGA AGC TCT GTC ACA ATC AAG GTT ACC ATT ATA 44 9 

I YLS I LGLL LLYMV YLTLVE 126 
ATT TAT CTC TCC ATT TTG GGC CTT CTA CTT CTG TAC ATG GTA TAT CTT ACT CTG GTT GAG 509 

PILKRRLFGHAQLIQSDDDI146 
CCC ATA CTG AAG AGG CGC CTC TTT GGA CAT GCA CAG TTG ATA CAG ACT GAT GAT GAT ATT 569 

GDHQP-FANAKDVLARSRSRA166 
GGG GAT CAC CAG CCT TTT GCA AAT GCA CAC GAT GTG CTA GCC CGC TCC CGC AGT CGA GCC 629 

NVLNKVEYAQQRWKLQVQEQ186 
AAC CTG CTG AAC AAG GTA CAA TAT GCA CAG CAG CGC TGG AAG CTT CAA GTC CAA GAG CAG 689 

RKSVFDRHVVLS* 199 
CGA AAG TCT GTC TTT CAC CGG CAT GTT GTC CTC AGC TAA 728 

TTCGCAATTGAATTCAAGGTGACTAGAAAGAAACAGGCAGACAACTGGAAAGAACTGACTGGGTTTTGCTCGGTTTCAT 807 
TTTAATACCTTGTTCATTTCACCAACTGTTC a 8 6 

TGTTAACGTAATAATAGAGACATTTTTAAAACCACACAGCTCAAAGTCAGCCAATAAGTCTTTTCCTATTTGTGACTTT 965 
TACTAATAAAAATAAATCTGCCTGTAAATTATCTTGAAGTCCT x 0 4 4 

TTTTAACTTG ACTTTCAAC ATAATTTTCAGGGTTTTTGTTCTTGTTGTTTTTTGTTTGTTTCTTTTCGTGCGAC ACGGG 1123 
ACCGATGCCTCGGAAGTGGTTAACAACTTTTTTCAAGTCACTTTACTAAACAAACTTTTGTAAATAGACCrTACCTTCT 1202 
AT7TTCGAGTTTCATTTATATTTTGCACTGTAGCCACCCTCATCAAAGAGCTGACTTACTCATTTGACTTTTCCACTCA 1281 
CTGTCTTATCTGGCTATCTGCTGTGTCTCCACTTCATCGTAAACGGCATCTAAAATGCCTGGTCGCTTTTCACAAAAAG 13 60 
CACATTTTCTTCATCTACTCTG ATGTCTGATGCAATGCATCCTAGAACAAACTCGCCATTTCCTAGTTTACTCTAAAGA 14 3 9 

CTAAACATACTCTTCCTGTGTGTCGTCTTACTCATCTTCTAGTACCTTTAAGCACAAATCCTAAGGACTTGGACACTTG 1518 
CAATAAAGAAATTTTATTTTAAAAAAAAA^ {S60 
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M A S L W 5 

GTCGACCCACGCGTCCGGGCGCGGGGCTCGGGGCTCGCAGGAGCGGCTGGCTCCCGCG ATG GCG AGC CTA TGG 73 

CGN LLRLGSGLSM SCLALSV 25 
TGC GGA AAC CTG CTG CGG CTG GGC TCG GGG CTC AGC ATG TCC TGC CTG GCG CTG TCG GTG 133 

LLLAQLTGAAKNFEDVRCKC 45 
CTG CTG CTC GCG CAG CTG ACA GGC GCC GCC AAG AAT TTT GAA GAT GTG AGA TGT AAA TGC 193 

ICPPYKENPGHIYNKNISQK 65 
ATC TGC CCT CCC TAT AAA GAG AAT CCT GGG CAC ATT TAT AAT AAG AAT ATA TCT CAG AAA 2 S3 

DCDCLHVVEPMPVRGPDVEA 85 
GAT TGT GAT TGC CTT CAT GTC GTG GAG CCC ATG CCT GTA CGG GGA CCT GAT GTA GAA GCA 313 

YCIiRCECKYEERSSVTIKVT 105 
TAC TGT CTA CGC TGT GAA TGC AAA TAC GAA GAG AGA AGC TCT GTC ACA ATC AAG GTT ACC 3 73 

IIIYLSILGLLLLYMVYLTL125 
ATT ATA ATT TAT CTC TCT ATT TTG GGC CTT CTG CTT CTG TAC ATG GTA TAT CTT ACC TTA 433 

VEPI LKRRLFGHSQLLQSDD 145 
GTT GAG CCC ATC CTG AAG AGG CGC CTC TTT GGA CAC TCC CAG CTG TTG CAG AGC GAT GAT 4 93 

DVGDHQ PFANAHDVLARSRS 155 
GAC GTT GGG GAT CAC CAG CCT TTT GCA AAT GCC CAT GAT GTG CTG GCC CGC TCT CGC AGC 553 

RANVLNKVEYAQQRWKLQVQ 185 
CGA GCC AAT GTT CTA AAC AAG GTG GAG TAC GCT CAG CAG CGC TGG AAG CTC CAG GTC CAG 613 

EQRKSV FDRHVVLS* 200 
GAG CAG CGA AAG TCT GTC TTC CAC CCA CAC CTT GTC CTC AGC TAA 658 

CTGGC AACTGGAATCAGGTGACTAGG AACAACACGCAGACAACTGGGAAGAATTGTCTGGGTGTCCGTGCGTTTTAATG 73 7 

CCATGTTTGTTTTTACAAATCCTTGCTGGATGGAGGAAGACTCCAAACTGCAAGCAAACCCCATGCTTGGTATTTTCCT 816 

GTTAATATATTAATAC AG ACATTTTTACAGCACACAGTTCCAAGTCAACCAGTAAGTCTTTTCCTACTTGTGACTTTTA 8 95 

CTAATAAAATTAAGCTGCCTCTGACTTATCTTGAAGCCCCGTCCCTGCAACAAGCTCTCTCTTTCTTGCCACACACTTC 974 

TAACTTGGTGTTCAAGATAACTTCCAGGTGTGTTTTTGCnTCTCTTTC 1053 

GGGAGTGCTTGAGTAGCTrCTCAAGTGTCTTTTCCAGACAGACTTATGAATACTTCACACCCTCTACTTCACACTTGTT 113 2 



AATGTCCCAGTGTAGCTGCCTTGTCAGCGTGCTGGCCTCCCCACTTGACTTTTGCACTGACTACATTACCTAAGATTCT 1211 
GG'rTAGCCTGTGGCTGCATTTCATCACCAGTTCGATCTCAAATGCCTGGGGCCTCCTCACAAAATGAAGATTTGTTTCA 1290 
TGCACTGTCATGTCTGACGCAACATCTTCTAGAACAGACTCCCCATCrGCTAGTTTACACTGATACCTAAACACACTCT 1369 

CCCAAGCCTCCCTGGATGATTGACGTACAAATACTGATCAGCCTTTTCTGTCTTGCTGAGACGCAGTTCTTTGAACTGA 1527 



TGTGGGCAGCTTTGAACAAGGACTAGAGTTCAGATTGCCTCTCTCTGAGAAGTCTAACAGTTATTGGATAACTGGCTTT 1606 
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TTTCTTCCTACATCCTCTTTGGAATGTAACAATAA^ 1681 



WO 00/18904 



16/112 



PCT/US99/22817 



G7CGACCCACGCGTCCGCTCTCAGTCACCGGAATCTACGTGGGGCCGCCCGGAGCGGCGTCCTCGGGAGCCGCCTCCCC 7 9 

GCGGCCTCTTCGCTTTTGTGGCGGCGCCCGCGCTCGCAGGCCACTCTCTGCTGTCGCCCGTCCCGCGCGCTCCTCCGAC 1 S 8 

MIRCGLACE 9 
CCGCTCCGCTCCGCTCCGCTCGGCCCCGCGCCGCCCGTCAAC ATG ATC CGC TGC GGC CTG GCC TGC GAG 227 

RCRWI LPLLLLSAIAFDI IA 29 
CGC TGC CGC TGG ATC CTG CCC CTG CTC CTA CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 287 

LAGRGWLQSSDHGOTSSLWW 49 
CTG GCC GGC CGC GGC TGG TTG CAG TCT AGC GAC CAC GGC CAG ACG TCC TCG CTG TGG TGG 247 

KCSQEGGGSGSYEEGCQSLM 69 
AAA TGC TCC CAA GAG GGC GGC GGC AGC GGG TCC TAC GAG GAG GGC TGT CAG AGC CTC ATG 407 

EYAWGRAAAAML FCGFI I L V 89 
GAG TAC GCG TGG GGT AGA GCA GCG GCT GCC ATG CTC TTC TGT GGC TTC ATC ATC CTG GTG 467 

ICFI LSFFALCGPQMLVFLR 109 
ATC TGT TTC ATC CTC TCC TTC TTC GCC CTC TGT GGA CCC CAG ATG CTT GTC TTC CTG AGA 527 

VIGGLLALAAVFQI ISLVI Y 129 
GTG ATT GGA GGT CTC CTT GCC TTG GCT GCT GTG TTC CAG ATC ATC TCC CTG GTA ATT TAC 587 

PVKYTQTFTLHANPAVTYIY 149 
CCC GTG AAG TAC ACC CAG ACC TTC ACC CTT CAT GCC AAC CCT GCT GTC ACT TAC ATC TAT 64 7 

NWAYGFGWAATI ILIGCAFF 169 
AAC TGG GCC TAC GGC TTT GGG TGG GCA GCC ACG ATT ATC CTG ATT GCC TGT GCC TTC TTC 707 

FCCLPNYEDDLLGNAKPRYF 189 
TTC TGC TGC CTC CCC AAC TAC GAA GAT GAC CTT CTG GGC AAT GCC AAG CCC AGG TAC TTC 767 

Y T S A 194 
TAC ACA TCT GCC TAA 782 

CTTGGGAATGAATGTGGGAGAAAATCGCTGCTGCTGAGATGGACTCCAGAAGAAC 861 

AACCCATTTTTTGGCAGTGTTCATATTATTAAACTAGTCAAAAATGCTAAAATAATTTGGGAGAAAATATTTTTTAAGT 940 

ACrrCTTATAGTTTCATGTTTATCTTTTATTATGTTTTC 1019 

ATTTCCTTATATCTATCCATAACATTTATACTACATTTGTAAGAGAATATGCACGTGAAACTTAACACTTTATAACGTA 1098 

AAAATCACGTTTCCAACATTTAATAATCTC ATCAAGTTCTTGTTATTTCCAAATAGAATGGACTCGGTCTGTTAAGGGC 1177 

TAAGGAGAAGAGGAAGATAAGGTTAAAAGTTGTTAATGACCAAACATTCTAAA^ 1256 

CAAGCCTTCGAACTATTTAAGGAAAGCAAAATCATTTCCTAAATGCATATCATTTGTCAGAATTTCTCATTAATATCCT 1335 

GAATCATTCATTTTAGCTAAGGCTTCATGTTG ACTCGATATGTCATCTAGGAAAGTACTATTTCATGGTTCAAACCTGT 1114 

TGCCATAGTTGGTAAGGCTTTCCTTTAAGTGTGAAATATTTAGATGAAATTrTCTCTTTTAAAGTTCTTTATAGCGTTA 14 93 

CGGTCTGGGAAAATGCTATATTAATAAATCTGTAGTGTTTTGTGTTTATATCTTCAGAACCAGAGTAGACTGGATTGAA 15 72 

ACATGCACTGCCTCTAAT7TATCATGACTCATAG ATCTGGTTAAGTTGTCTAGTAAAGCATTAGGACGGTCATTCTTGT 1651 
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CACAAAAGTGCCACTAAAACAGCCTCACGAGAATAAATGACTC^ 1730 

TATAGACAGGCTTCTGATAGTTTGCAA^^ 1809 

GATTTTAAATGTCTGATATAAAACATGCCACAGGAGAATTCGGGGATTTGAGTTTCTCTGAATAGCATATATATGA7GC 1889 

ATCGGATAGGTCATTATGATTTTTTACCATTTCGACTTACATAATGAAAACCAATTCATTTTAAATATCA 1967 

TTTTGTAAGTTGTGGAAAAAGCTAATTGTAGTTTTCATTATGAAGTTTTCCCAATAAA 2046 

AAAAAAAAAAAGGGCGGCCGC 
2067 
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GTCGACCCACGCGTCCGGCGCTCTGAGTCACCGGAATCAAGGTGTGGCTGGAGCGCCGCTCCCCCGCCGCCAGCCCGGG 79 

GGCCGCGTCTTCGGGGG AGCCGCCTCTTCCTTTAGTCGCGGTGTCAGCGCTCGCAGGACCACTCTTGGCCGCTGCTCCT 158 

MLRCGLACE 9 
GCCCGGCGTTCCTCCGCTCCGCGCCCGCCCCCACCGACGAC ATG CTG CGC TGC GGC CTG GCC TGC GAG 226 

RCRWILPLLLLSAIAFDIIA 29 
CGC TGC AGG TGG ATC CTG CCC CTG CTG CTG CTC AGC GCC ATC GCC TTC GAC ATC ATC GCG 286 

LAGRGWLQSSNHIQTSSLWW 49 
CTG GCC GGC CGC GGC TGG CTG CAG TCT AGC AAC CAC ATC CAG ACA TCG TCG CTT TGG TGG 346 

RCFDEGGGSGSYDDGCQSLM 69 
AGG TGT TTC GAC GAG GGC GGC GGC AGC GGC TCC TAC GAC GAT GGC TGC CAG AGC CTC ATG 406 

EYAWGRAAAATLFCGFIILC 89 
GAG TAC GCA TGG GGA CGA GCA GCT GCA GCC ACG CTT TTC TGT GGC TTT ATC ATC CTG TGC 466 

ICFI LSFFALCGPQMLVFLRI09 
ATC TGC TTC ATT CTC TCG TTC TTC GCC CTG TGT GGA CCC CAG ATG CTT GTT TTC CTG AGA 526 

V TGG L LALAAI F Q I I S LVI Y 129 
GTC ATT GGA GGC CTC CTC GCA CTG GCT GCC ATA TTC CAG ATC ATC TCC CTG GTA ATC TAC 586 

PVKYTQTFRLHDNPAVNYIY149 
CCC GTG AAG TAC ACA CAG ACC TTC AGG CTT CAC GAT AAC CCT GCT GTT AAT TAC ATC TAT 646 

NWAYG FGWAATI I L I G CSFF 169 
AAC TGG GCC TAT GGC TTC GGA TGG GCG GCC ACC ATC ATC TTG ATT GGT TGT TCC TTC TTC 706 

FCCLP NYEDDLLGAAKPRYF189 
TTC TCC TGC CTC CCC AAC TAC GAG GAT GAC CTT TTG CCC GCC GCC AAG CCC AGG TAC TTC 766 

Y P P A * 194 
TAT CCC CCA GCC TAA 781 

TGTGGGAGGAAGAGCCTGAGAAAAGCCTCCTGCAAGATGGATCTGAGGAGCAAACTGTTCTCCAACGCACAAGGAACCT 860 
ACGTTTGCGCAATGTTCATATGATCAGAAATGCTACAATA^ 9 3 9 

ATGTATCTCGTGTCCAGTTAAAAAGACTTCAATTCTCTTTC 1QlB 
CCATTTAAGCTTCATTTGTTAAAGAATATGCCTGTGAAACTTGATAAGGTACAAATGTAGCAGCCTCT 1097 
CTGATGGGCCTTCTGTTTTTCCACATAGAATCCGTTGTTTCTGCTAA 1176 
TTCCGTGACCAAATATCCTGAAATTAGTATTTTTTTAAAAAGACCTTATTTTGAGTTTTCAGTTACATAAAAAAGCAGA 1255 
ACCACATTGGTTTCCTAAGTGACCATCGTTTGTGAGAATTTTTAGTCAGTG 1 j 3 4 

TCGTGTTGACTTTCTCTGATGCGTAGAAAAGTGTTCTAACGTAGCCAACGTTAAGCCGCTGTCACTACTGAAATGCT^ 1413 
G AATTTTCCTCTTTTCCCGT AGTGTAG ACGGGT AGGGTG TGCC AAG AAGCC GTGTT ACC A CATCTGT ACT A TTCTGTG T 14 92 

CTATGCTTAGAACCAGCCTAGACCCGATGGCAGGATGCACTACGCCTAATCCCTCCCAACTCGTGGATGTGAAGACGTC 1571 
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AGGTAGG AAG G CACAGG AGGG TCACCACTGTCACAGCAGTG CCATGCAGACATCCTAGGAGAAGACATGGCAGTGTTTC 1650 
TTCTCAGTGCTTCTTCCCTTAACTGAGCTCTGCTCACAGACAGCTAGAATAGATTTTAACTGTAACAGAAACCTAAATG X 72 9 

TGTCTCTGAATACATACCGGAAGGGCTACTATTACCTTTTCCTTA ! 8 a 7 

TTAACTATCAGAACACTATTTTGTAAGGTGCTGCAAAGACAGTTGAAGTTT^ 1966 
TGTTCAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG^ 2030 
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GTCGACCCACGCGTCCGGCCGCGCGCTCTCTCCCGGCGCCCACACCTGTCTGAGCGGCGCAGCGAGCCGCGGCCCGGGC 79 

MAGIPGLLFLLF 12 
GGGCTGCTCGGCGCGGAACAGTGCTCGGC ATG GCA GGG ATT CCA GGG CTC CTC TTC CTT CTC TTC 144 

FLLCAVGQVSPYSAPWKPTW 32 
TTT CTG CTC TGT GCT GTT GGG CAA GTG AGC CCT TAC AGT GCC CCC TGG AAA CCC ACT TGG 204 

PAYRLPVVLPQSTLNLAKPD 52 
CCT GCA TAC CGC CTC CCT GTC GTC TTG CCC CAG TCT ACC CTC AAT TTA GCC AAG CCA GAC 254 

FGAEAKLEVSSSCGP .QCHKG 72 
TTT GGA GCC GAA GCC AAA TTA GAA GTA TCT TCT TCA TGT GGA CCC CAG TGT CAT AAG GGA 324 

T PLPTYEEAKQYLSYETLYA 92 
ACT CCA CTG CCC ACT TAC GAA GAG GCC AAG CAA TAT CTG TCT TAT GAA ACG CTC TAT GCC 384 

NGSRTETQVGI YILSSSGDG 112 
AAT GGC AGC CGC ACA GAG ACG CAG GTG GGC ATC TAC ATC CTC AGC AGT AGT GGA GAT GGG 4 44 

AQHRDSGSSGKSRRKRQIYG 132 
GCC CAA CAC CGA GAC TCA GGG TCT TCA GGA AAG TCT CGA AGG AAG CGG CAG ATT TAT GGC 504 

YDSRFSIFGKOFLLNYPFST 152 
TAT GAC AGC AGG TTC AGC ATT TTT GGG AAG GAC TTC CTG CTC AAC TAC CCT TTC TCA ACA 564 

SVKLSTGCTGTLVAEKHVLT 172 
TCA GTG AAG TTA TCC ACG GGC TGC ACC GGC ACC CTG GTG GCA GAG AAG CAT GTC CTC ACA 624 

AAHCI HDGKTYVKGTQKLRV 192 
GCT GCC CAC TGC ATA CAC GAT GGA AAA ACC TAT GTG AAA GGA ACC CAG AAG CTT CGA GTG 684 

GFLKPKFKDGGRGANDSTSA 212 
GGC TTC CTA AAG CCC AAG TTT AAA GAT GGT GGT CGA GGG GCC AAC GAC TCC ACT TCA GCC 74 4 

MPEQMKFQWIRVKRTHVPKG 232 
ATG CCC GAG CAG ATG AAA TTT CAG TGG ATC CGG GTG AAA CGC ACC CAT GTG CCC AAG GGT 804 

WIKGNANDXGMDYDYALLEL 2S2 
TGG ATC AAG GGC AAT GCC AAT GAC ATC GGC ATG GAT TAT GAT TAT GCC CTC CTG GAA CTC 864 

KKPHKRKFMKIGVSPPAKQL 272 
AAA AAG CCC CAC AAG AGA AAA TTT ATG AAG ATT GGG CTG AGC CCT CCT GCT AAG CAG CTG 924 

PGGR I H FSGYDNDRPGNLVY 292 
CCA GGG CGC AGA ATT CAC TTC TCT GGT TAT GAC AAT GAC CGA CCA GGC AAT TTG GTG TAT 984 

RFCDVKDETYDLLVOQCDAO 312 
CGC TTC TGT GAC CTC AAA GAC GAG ACC TAT CAC TTC CTC TAC CAG CAA TCC GAT GCC CAG 104 4 

PCASCSGVYVRMWKRQQQKW 332 
CCA GGG GCC ACC CCC TCT GGG GTC TAT GTG AGG ATG TGG AAG ACA CAG CAG CAG AAG TGG 1104 

E R K I IG I FSGHQWVDMNGSP 352 
GAC CGA AAA ATT ATT GGC ATT TTT TCA GGG CAC CAG TGG GTG CAC ATG AAT GGT TCC CCA 1164 
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QDFNVAVRITPLKYAQICYW 372 
CAG GAT TTC AAC GTG GCT GTC AGA ATC ACT CCT CTC AAA TAT GCC CAG ATT TGC TAT TGG 1224 

IKGNYLDCRE G* 334 



CACAGTGTTCCCTCCTGGCAG(^TrAAC^GTCTTCATGTTCTTATTTTAGGAGAGGCa^ 13 3 9 

CGTGCACACGTGTGTGTGTGTGTGTGTGTGTAAGGTGTCTTATAATCrrrr ACCTATTT^ 1418 



ATAAAAAAAATACTGATTTGGGGCAATGAGGAATATTTGACAATTAAGTTAATCTTCACGTTTTTGC^ 1576 

TTATTTCATCTGAACTTGTTTCAAAGATTTATATTAAATATTTGGCATA i g 5 5 

GTGTGTTTTCTTCTGAGATTCATCTTGGTGGTGGGTTTTTTTGTTTTTTTAATTCAGTGCCTGATCT^ 173 4 

TAAGGCAGTGTTCCCATTTAGGAACTTTGAGAGCATTTGTTAGGOG^ 1813 

GTCTTTGAACAGTAAAATGATGTGTTGACTATACTC^TACACATATTAAACTATACCTTATAGTAAACCAGTAT 1892 

GCTGCTTTTAGTTCCAAAAATAGTTTCTTTTCCAAAGGTTC 1 g 7 1 

CCAACTTTAAAGTCATACCAGAGTCGCCAAGAGTGTTTATCCCAACCCTTCCATTTAACAGGAT^ 2050 

GGAACTAGCTATTITTCAGAAGACAATAATCAGGGCTTAATTAGAACAGGCT^ 2129 

CCACACTAAAAACAATCATACCATTTTACCCCTCGATTATAGCACATCTC^^ 2208 

AAATGAATTAAATTCCAGAGAACAATGGAAGCATTGCCTGGCAGATGTCACA^ 2287 



TGGAAACTTTTCTCTCTCATTTATAGTGAAAATACTTGGAAGTTACTTTAAGAAAACCAGTGTGGCCTTT^ 2445 
GCTTTAAAACGGCCGCTTTTGCTGGAATGCTCTAGGTTATAGATAAACAATT AGGTATAATAGCAAAAATGAAAATTGG 2524 
AAGAATGCAAAATGGATCAGAATCATCCCTTCCAATAAAGGCCTTTACACATG 2603 
AGCATATACAGAAAACACTTGGACTTATTCTATGTTTTTATTTTATGGCTCTCCGCCTAACCACTTCT 2682 
TCGGACAAAAAATCAAATGGACTACAAGCACGTGTTTGCTGTGCTTCCACCCCAGGTAAACCn'GCATTGTACCAATTT^ 2761 
TAAGGATATTCACATCCAGCACrCTCACTTACACATTCTCTGCGG 284 0 

GGATAATTCTGATAAGGCACTCAAGAAACGTACAACCACAGTGCTTTCTTCAAATCATATGAGAAATACTATGCATAGC 2919 

AAGGAGATGCAGAGCCGCCAGGAAAATTCTGAGTTCCAGCACAATTTTCTTTGGAATCTAACAGCAATCTAGCCTGACG 2998 

AAC AAGGCAGGTCTCCATTTCTATCTCTGCTATTTGGGCGTTTTGTTTGTTTTTCCTTTA 3 0 77 

ACTGAACACCAAGACCAGAATGGATTTTTTTAAAAAAATAGATGTTCCTTTTGTGAAGCACCTTGATTCCT 3156 

ATTTTTTCCAAAGTTAGACAATO 3335 

GAATGATACACCCATATGCTATATACAGCTT/\ACTCACAGAACTGTAAAAGAAAATTATAAAATAATTCAA(^TG 3 314 

TCTTTTTACTGATAATAAAAGAAACCATGGTA'nTAAACTATCA 3393 



ATT AAA GGA AAC TAC CTG GAT TGT AGG GAG GGG TGA 



384 
1260 




TATCATATATCATTTAAGCAGTTTGAAGGCATACTTTTGCATAGAA 14 97 




2366 
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ATTATTAATATAATTACTGCTTTACATG^ 3472 

ACATTTCCCAAAGTGTGCTCCTTAAACACTCATGCCTTAT^ 3551 

ACX3AAGATGCCTCTCCATTTTCCCTCTCTTTATCAGAGGTTCA^ 3630 

TGTTGTAAAGGGACAAGTTGAGGTTGTAAAATCTGC\TTTAAATAAACATCTTTGATCACAAAAA 3709 

GGCCG 3714 



Fig 1 3 ( 3 j (f 3) 



WO 00/18904 



23/1 12 



PCT/US99/22817 



GTCGACCCACGCGTCCGCGGACGCGTGGGCACTCGGCCACTCTGCGGAGCAGGCATGGGAGCCGCGCGCGTCCTCCGGG 79 

MA 2 

CGCCCACACCTGTCTGAGCGGCGCACGGCCGCGGCCCCGGCGGGCTGCTCCACGCGGTAGCACTCAGC ATG GCT 153 

GIPGLFILLVLLCVFMQVSP 22 
GGA ATC CCG GGG CTC TTC ATC CTT CTT GTC CTG CTC TGT GTG TTC ATG CAG GTG AGT CCC 213 

YTVPWKPTWPAYRLPV VLPQ 42 
TAC ACC GTT CCG TGG AAA CCC ACA TGG CCG GCT TAT CGC CTC CCT GTA GTC TTG CCT CAG 273 

STLNLAKADFDAKAKLEVSS '62 
TCT ACC CTC AAC TTA GCT AAG GCA GAC TTC GAC GCC AAA GCG AAA TTG GAG GTG TCC TCC 333 

SCGPQCHKGTPLPTYEBAKQ 82 
TCA TGT GGA CCT CAG TGT CAC AAG GGA ACA CCA CTG CCC ACC TAC GAA GAG GCC AAG CAG 393 

YLSYETLYANGSRTETRVGt 102 
TAC CTT TCC TAT GAA ACC CTT TAT GCC AAT GGC AGC CGC ACA GAG ACT CGG GTG GGC ATC 453 

YILSNGEGRARGRDSEATGR 122 
TAC ATC CTC AGC AAT GGT GAA GCC AGG GCA CGA GGC AGA GAC TCG GAG GCC ACA GGG AGA 513 

SRRKRQIYGYDGRFSIFGKD 142 
TCT CGC AGG AAG AGG CAG ATT TAT GGC TAC GAT GGC AGG TTT AGC ATT TTT GGG AAG GAC 573 

FLLNYPFSTSVKLSTGCTGT 162 
TTC CTG CTC AAT TAT CCT TTC TCA ACA TCG GTG AAG TTG TCT ACT GGC TGC ACT GGC ACC 633 

LVAEKHVLTAAHCIHDGKTY 182 
CTG GTG GCA GAG AAG CAC GTC CTC ACT GCT GCC CAC TGC ATA CAC GAT GGG AAA ACC TAT 693 

VKGTQKLRVGFLKPKYKDGA 202 
GTG AAA GGG ACA CAG AAA CTC CGA GTG GGC TTC CTG AAG CCC AAG TAT AAA GAT GGT GCC 753 

EGDNSSSSAMPDKMKFQWIR 222 
GAA GGG GAC AAC AGC TCG AGC TCA GCC ATG CCA GAC AAG ATG AAG TTT CAG TGG ATC CCC 813 

VKRTHVPKGWIKGNANDIGM 242 
GTG AAA CGC ACC CAT GTG CCC AAG CCG TCG ATC AAG GGC AAT GCC AAT GAC ATC GGC ATG 8 73 

DYOYALLEL KKPHKRQFMKI 262 
GAT TAT GAC TAC GCC CTG CTG GAA CTC AAG AAA CCC CAC AAA AGA CAG TTC ATG AAG ATT 93 3 

CVSPPAKQLPGGRIHFSGYD 282 
GGT CTG AGT CCT CCA GCG AAG CAG CTC CCA CGG GGC AGG ATC CAC TTC TCT GGT TAT GAC 993 

NDRPCNLVYRFCDVKDETYD 302 
AAT GAC CCG CCC CGC AAT TTG GTG TAC CGC TTC TGT GAT GTC AAA GAT GAG ACC TAC GAC 1053 

LLYQ'QCDAQPGASGSGVrVR 322 
CTT CTC TAC CAG CAG TGT GAC CCC CAG CCC GCG GCC AGT GCT TCA CGG GTC TAT GTG AGG 1113 

MW KRPQQKWERKIIGIFSGH 342 
ATG TCC AAG AGA CCA CAG CAG AAA TGG GAA ACA AAA ATT ATC GGC ATC TTT TCA GGG CAC 1173 

OWVDMNGSPOD FNVAVRITP 362 
CAG TGC GTG GAC ATG AAT CCC TCT CCA CAC CAT TTC AAC CTG CCA GTT AGA ATC ACG CCT 12 3 3 
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LKYAQICYWI KGNYLDCREG 382 

CTT AAA TAT GCC CAG ATT TGC TAT TGG ATT AAA GGA AAC TAC CTA GAT TGC AGG GAG GGG 1293 

* 383 

TGA 1296 

CATGCGTCCTCTTCCCAGCACCAATGGTCTTTTTGCA 1375 

GTGTGAGTCACATAGTATCTTTTACCTAGTATTCTTCAAATGGCAAAAATTATTGGCTATA 14 54 

GTGCGTTATAGCATTTAAGCAGTCTG AAAGCATACTTTTCCATAGAGACTTTAAAGTATTCGGGTAATAGGGCCTATTT 1533 

GACAAGGAAGTTAAACTTTCAGTTTTTGGAGAATTCTAATTTTTG^ 1612 

ATACGTGACACACAGGGAATATGAATTCTTATGTTTGTATATGTATA 1691 

TTGTAATGTGTGGTTATTATGCTTCCAGATAATGATAGCAAAGTCTTCAATAGGCAATTTATAATGTTTTGGATTC 1770 

CATTTACGTAGTAGTCCTTGAAGAGAACAATAATTTATTGGCTATATTGATACCCATATAAGACTGTATCTTACAGTGC 184 9 

ACAG AATTCCCACGCTG CTTTTAGTTTTG AAAATAAAACTTTCC CTTGTAAAAAAAAAAAAAAAAAAAAAGGGCGGCCG 1928 

ACAGAATTCCCACGCTCCTTTTAGTTTTGAAAATAAAACTTTCCCTTGTAAAA^^ 1928 
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MAPASRLLALWALA 14 
GTCGACCCACGCGTCCGGGCTC ATG GCG CCG GCG TCG CGG TTG CTC GCG CTC TGG GCG CTG GCG 64 

AVAL PGSGAEGDGGWRPGG P 34 
GCT GTG GCT CTA CCC GGC TCC GCG GCG GAG GGC GAC GCC GGG TGG CGC CCG GGC GGG CCG 124 

CAVA E E ERCTV ERRADLTYA 54 
GGG GCC GTG GCG GAG GAG GAG CGC TGC ACG GTG GAG CGT CGG GCC GAC CTC ACC TAG GCG 184 

EFVQQYAFVRPVILQ GLTDN 74 
GAG TTC GTG CAG CAG TAC GCC TTC GTC AGG CCC GTC ATC CTG CAG GGA CTC ACG GAC AAC 244 

SRFRALCSRDRLLAS FGDRV 94 
TCG AGG TTC CGG GCC CTG TGC TCC CGC GAC AGG TTG CTG GCT TCG TTT GGG GAC AGA GTG 3 04 

VRLS TANTYSYHKVDLPFQE 114 
GTC CGG CTG AGC ACC GCC AAC ACC TAC TCC TAC CAC AAA GTG GAC TTG CCC TTC CAG GAG 364 

YVEQLLHPODPTSLGNDTLY 134 
TAT GTG GAG CAG CTG CTG CAC CCC CAG GAC CCC ACC TCC CTG GGC AAT GAC ACC CTG TAC 424 

FFGDNNFTEWAS LFRHYSP P 154 
TTC TTC GGG GAC AAC AAC TTC ACC GAG TGG GCC TCT CTC TTT CGG CAC TAC TCC CCA CCC 484 

P FGL LGTAPAYS FGIAGAGS 174 
CCA TTT GGC CTG CTG GGA ACC GCT CCA GCT TAC AGC TTT GGA ATC CCA GGA GCT GGC TCG 54 4 

GVPFHWHGPGYSEVIYGRKR 194 
GCG GTG CCC TTC CAC TGG CAT GGA CCC GGG TAC TCA GAA GTG ATC TAC GGT CGT AAG CGC 604 

W FLY P P E KT P E F HP N KTTLA 214 
TGG TTC CTT TAC CCA CCT GAG AAG ACG CCA GAG TTC CAC CCC AAC AAG ACC ACG CTG GCC 664 

WLRDTYP ALPPSARPLECTI234 
TGG CTC CGG GAC ACA TAC CCA GCC CTG CCA CCG TCT GCA CGG CCC CTG GAG TGT ACC ATC 724 

RACEVLYF PDRWWHATLNLD 254 
CGG CCT GGT GAG GTG CTG TAC TTC CCC GAC CGC TGG TGG CAT CCT ACG CTC AAC CTT GAC 784 

TSVFISTFLG* 265 
ACC AGC CTC TTC ATC TCC ACC TTC CTC CGC TAG 817 

CCAAAACAGCTGGCAGGACTGCCGGTCACACACCAGCACGTCCCACCTCGTGCTCACGGATTTTATTACACAGATAGTG 8 96 

GCC^CAATGGCCTCAGCCCAGCCCACCCTCACCTGCTTTTCCAGCCCACAAAGGGGCACGATCACGCCCCAGCAAAAGC 975 

GATCC7CAGAGGGGAAACACTCCAGAGTCCAACAGCAGAACTTGGGGGAAGCGGTCGGGGTGGCCAGGAACATAAACTA 1054 

TCTA7ACGCGCCCGCCGCTTCTCCCCACCCCTCCCCTGC ACCAGGACCCCACGTACGGCAGCGAACCTCAGTACTCCTC 113 3 

CACCCAGCCATTCTCAGAGATGAATGCGTCAATAACCTCCTTCATAGCCAAGTTGGGGATGAGCTGTTCCTGGGTCAGG 1212 

OCGCTCCGGGTCACCCCGTCAAAATGACCCACACGCTGCAGTCACAAGAAGGGCAGAGGCCAGTCATGGGGCCCACGAC 1291 

CATGCCACTCCCCCTGCTCCCCCAGCCGCACGCCTCACCTGCAGGTGCTCCTCGATGTCCTTGCGGTCGTAGGTGATGC 13 70 

CACTGOGCGTGATGCACGGi:TCCCCCATCAGCTCAA/\GCTCATCTTCCCACACAGCTAGTCCGCGATGTCTCGCTTCTC 14 4 9 
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TGGCACAGGGG CACACGGTCAG AGGCTG AAAAGGGGCACTG CACGAGCACCTG CCAG CCATCGGCAGCAAGCGACACAC 1528 
ACTCACCTTCCTCTTCTCATCCACCTGAGAAAAAAGCrCGTCCATGT^ 1607 
GCTGTGCTTGGGGGAGACACCCCACCTCCCTCCTCC^TGGGGCACAGACCCAACACAAGGCGGGGATGCTCCCACGCCA 1686 
CGTGCACA<^CACAGACCCACATGTGGGTGCK3GGGC^CCCTCACGTGCTTGGCCTCAATGC^GGCC7GCTGGGCCCGM 1765 
CGTGGCTGTCGTCCTCATCACCCTCGTGGTTTCGCTGGCACTCTTCCAGCTCCCTGGGGGTTGACCAGGAGCCGGTCAG 1844 
AGATGGACCTGGCCAGATGTCTGACCACACCCCAATCTCAGAGCTAACATCCAC^ 1923 
GTAAAGCCTTCGATAAACAAAAAAAAAAAAAAAAAAAAGGGCGGCCG 1970 



FIG* ^ ( 
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MAA AGRRGLLLLFV 14 
GTCGACCCAC GCGTC CGGTT C ATG GCG GCG GCT GGG CGG CGC GGT CTG CTT TTG CTC TTT GTA 63 

LWMMVTVILPASGEGGWKQN 34 
CTA TGG ATG ATG GTG ACT GTG ATT CTG CCT GCC TCT GGC GAA GGG GGA TGG AAA CAG AAT 123 

0 L G X AAAVMEEERCTVERRA 54 
GGG CTG GGA ATT GCA GCA GCA GTA ATG GAG GAG GAG CGT TGC ACA GTG GAG CGT CGG GCA 183 

H I T Y S E FMQHYAFLKPVILQ 74 
CAC ATC ACG TAC TCC GAA TTC ATG CAG CAC TAT GCC TTC CTC AAG CCC GTC ATC TTG CAA 243 

GLTDNSKFRALCSRENLLAS 94 
GGA CTC ACG GAC AAC TCG AAG TTC CGG GCC CTG TGT TCC CGG GAA AAC CTG CTA GCC TCG 303 

F GDNIVRLSTANTYSYQKVD114 
TTC GGG GAC AAC ATT GTT CGC TTG AGT ACA GCC AAC ACC TAC TCC TAC CAG AAA GTG GAC 363 

LPFQEYVEQLLQPQDPASLG134 
CTG CCC TTC CAG GAA TAT GTG GAA CAG CTG CTG CAG CCC CAG GAT CCT GCA TCC CTA GGC 4 23 

NDTLYFFGDNNFTEWASLFQ154 
AAT GAC ACC CTG TAC TTT TTT GGA GAC AAC AAC TTC ACT GAG TGG GCA TCC CTC TTC CAG 483 

HYSPPPFRLLGTTPAY SFGI 174 
CAC TAC TCT CCG CCA CCA TTC CGT CTC CTG GGA ACC ACC CCT GCT TAC AGC TTT GGA ATT S4 3 

AGAGSGVPFHWHGPGFSEVI 194 
GCA GGA GCT GGA TCT GGG GTA CCC TTC CAC TGG CAT GGG CCT GGT TTC TCA GAG GTT ATC 603 

YGRKRWFLYPPEKTPEF HPN214 
TAT GGT CGG AAG CGC TGG TTC CTC TAC CCT CCT GAG AAG ACA CCT GAG TTC CAC CCT AAC 663 

KTTLAW LLE I Y PS LAL SAR P 234 
AAG ACC ACA TTG GCC TGG CTG CTG GAA ATA TAC CCA TCT CTA GCC CTG TCA GCA CGG CCT 723 

LECTIQACEVLYFPDRWWHA254 
CTA GAA TGT ACC ATC CAG GCT GGT GAA GTA CTG TAT TTT CCT GAT CGG TGG TGG CAT GCC 78 3 

TLNLD.TSVFISTFLG* 270 
ACA CTC AAT CTG GAC ACC AGT GTC TTC ATT TCT ACC TTC CTT GGC TAG 831 

CCACACAGGCAACTGGCAAGCCCACTGCACCAGCACATGCCAATGTAGTCCTCACAGACTTTATTACAGGACAGTGGCA 910 

GCAGCAGCAACCTCAGCCCACCCTCACCCACTCTCCACCCCAGAAGGGGCACAAGGGAGGCTCATGGTCCAGCAAGGGG 989 

TATGCTGAGAAGGGGAGCAGTTCAGAACCCATCAGCACGGCCGATGGGGGCACGCCCAGGGACACAAACTATACACGGA 1068 

CTGGACCTTCCCTCTCCAGATCCTCCTCGGCCAGGCTGCCAGGCAGGACATGGGCCCTCAATAGTCCTCTACCCACCCG 114 7 

TTCTCAGAGATGAAAGCGTCAATCACTTCCTTCATGGCCAAGTTGGGGATGAGCTGTTCCTCGGTCAAAGGGCTCCGGG 1226 

TCACAGGCTCAAAGTCGCCCACACCCTCCAACAG AGTCAAGACTGTTCAATGGCCTGAGTATACCCATCCGGGTACCAA 1305 

CGCTCTCCATCGCCCGGTCTCCATGCCCCCTCCTTACCTCCAGGTGCTCCTCAATGTCCTTGCGGTCATAGGTCATACC 1384 

ACTGCGTCTAATCCAGGCTTCCCCCATCACCrCAAACCTAATC 14 6 3 
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AGCACAAGGGGAAAATGTCTAGAACTC^AGGCGGCTG^ X 5 4 2 

TCCTCACCTTTCTTTTCTCGTCCACCTGAGAGAAGAGCTCATCCATATCTGCCATGTATTTATCCTGCAGAGTTGAGTG 1621 

CCATGTGTGGGCAACTCCTGTCTCCACACAGACACACACACTCTGTCCACCAGGGCACTCATGTCATGCATGGGCCAAC 1700 

AGATCCACCAAAGGCTGGGGCACTTTTCATGCCACACACAAACACACACACAATGACCCACATGTGGACTAGGGGCACC 1779 

CTCACGTGCTTGGCCTCAATGCAGGCCTGCTGGGCCCGGATGTGGCCATCATCTTCATGACCCTCGTGGTTCCGCTGAC 1 8 S 8 

ACTCCTCCAGTTCCCTGAGGGTTAACCAGAAGCTAGTTGGTGATGGCCCTGACCAGGAAATCACAGAGCCCGCCCCATC 1937 

TCAGGCCTCTTTCCTCCTGGGCTTCCCATGTACCGGT7GTTGTCCTTCAATAAAAACACTTGTGCTGGTGACTCAGTGT 2016 

CTGCTGGGGGAGGGACCCACCTCTCTCGCTCAGCAGCAATGAGCCTGGTGAGATATGAATGCAAAAAAAAAAAAAAAGG 2095 

GCGGCCG 
2102 
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M D N R F A 
CACGCGTCCGGCTGGCGGAGCAGGAGGATGGGCGAGCAGTCTGAATGCCAGA ATG GAT AAC CGT TTT GCT 



6 
70 



TAFV IACVLSLISTIYMAAS 26 
ACA GCA TTT GTA ATT GCT TGT GTG CTT AGC CTC ATT TCC ACC ATC TAC ATG GCA GCC TCC 130 

IGTDFWYEYRSPVQENSSDL 46 
ATT GGC ACA GAC TTC TGG TAT GAA TAT CGA AGT CCA GTT CAA GAA AAT TCC AGT GAT TTG 190 

NKS I WDEFISDEADEKTYND 66 
AAT AAA AGC ATC TGG GAT GAA TTC ATT AGT GAT GAG GCA GAT- GAA AAG ACT TAT AAT GAT 250 

ALFR YNGTVGLWRRCITI PK 86 
GCA CTT TTT CGA TAC AAT GGC ACA GTG GGA TTG TGG AGA CGG TGT ATC ACC ATA CCC AAA 310 

NMHWY S P PERTES FDVVTKC106 
AAC ATG CAT TGG TAT AGC CCA CCA GAA AGG ACA GAG TCA TTT GAT GTG GTC ACA AAA TGT 3 70 

VSFTLTEQFMEKFVD PGNHN 126 
GTG AGT TTC ACA CTA ACT GAG CAG TTC ATG GAG AAA TTT GTT GAT CCC GGA AAC CAC AAT 4 30 

SGIDLLRTYLWRCQFLLPFV 146 
AGC GGG ATT GAT CTC CTT AGG ACC TAT CTT TGG CGT TGC CAG TTC CTT TTA CCT TTT GTG 4 90 

SLGLMCFGALIGLCACICRS 166 
AGT TTA GGT TTG ATG TGC TTT GGG GCT TTG ATC GGA CTT TGT GCT TGC ATT TGC CGA AGC S50 

LYPTI ATGI LHLLAGNYSDS 186 
TTA TAT CCC ACC ATT GCC ACG GGC ATT CTC CAT CTC CTT CCA GGA AAT TAC TCA GAT TCT 610 

W L H E * 191 
TGG CTC CAT CAA TAA 625 

TTTTAATGATCTTCTACATTATCCTTC AT AATTACTCATTTCTCAATAATCTTTTAATTTCATCCCATG ACTCTG AGG A 704 

TAGCTTCCAAGCTCTTTAAATGGCCTTACAAACTCATTC 783 

CCACTGGGCCATCCCTATCGTAGTTTAAAAACATGCCCTTAAAATCCTTCGATCAATCTTCCATTCAGATTCCCATCCC 862 

CTTGAATCTAGGCTCGCTTGTGATGGTTTTGACCAATAGAG 94 1 

ATCATGTCTCCTTAAACCAGTTCTCTTGGAACACTCAGTCrTAGAACATTCCCTCTCCAAACCCAGATACCATGCTGTG 1020 

AAGTCCAGGCCACATCGAGGTGTCCTGTGTACATCCTCCAGCTGAAATCCCAAGCTAAGCTCCCAACTGACACCCAACA 1099 

TCATTTCCAGCCATGTGTGCGAGCCATCCTGGATGTCCACCCTTAACAACCCTTCAGAGGACTTCAGCCACAGCTATTA 1178 

TCrTACTACATCCTTGTGAGACTCTAATAAAGAJVCCAACTAGCTGAGCCCAATCAACCTATGGAACTGATAGAAATAAA 1257 

ATGAATTGTTGTTTTGTGCCCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAA 13 08 
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AATTCGGMWCM KKKGWGG WG CCGGTGGAGTG AG AGGATGGG CGAGCAGTCTGAATG CCAG A ATG GAT AAC CGT 75 

FATA FVIACVLSLI STIYMA 24 
T7T GCT ACT GCG TTT GTG ATT GCT TGT GTG CTT ACT CTG ATT TCC ACC ATC TAC ATG GCG 135 

ASIGTDFWYEYRSP IQENSS 44 
GCC TCC ATA GGC ACG GAC. TTC TGG TAT GAG TAT CGA AGT CCC ATT CAA GAG AAT TCA AGT 195 

DSNKIAWEDFLGDEADEKTY 64 
GAC TCG AAT AAA ATC GCC TGG GAA GAT TTC CTC GGT GAC GAG GCG GAT GAG AAG ACT TAC 2'55 

NDVLFRYNGSLGLWRRCITI 84 
AAC GAT GTT CTG TTC CGA TAC AAC GGC AGC TTG GGG CTG TGG AGA CGG TGC ATC ACC ATA 315 

PKNTHWYAPPERTES FDVVT 104 
CCC AAA AAC ACT CAC TGG TAT GCG CCA CCG GAA AGG ACA GAG TCA TTT GAT GTG GTT ACC 375 

KCMS FTLNEQFMEKYVDPGN 124 
AAA TGC ATG AGT TTC ACA CTA AAC GAG CAG TTC ATG GAG AAG TAT GTG GAC CCC GGC AAC 43 5 

HNSG I D LLRTYLWRCQFLLP 144 
CAC AAT AGC GGC ATC GAC CTG CTT CGC ACC TAC CTG TCG CGC TGC CAG TTC CTT TTA CCC 4 95 

FVSLGLMCFGALIGLCACI C 164 
TTC GTC AGC TTG GGC TTG ATG TGC TTT GGG GCG TTG ATT GGC CTC TGT GCC TGT ATC TGC 555 

RSL Y PTLATG TLHLLAGLCT 184 
CGC AGC CTG TAT CCC ACC CTC GCC ACT GGC ATT CTC CAT CTC CTT GCA GGT CTG TGC ACA 615 

LG S V S C YVAG I ELLHQ KVE L 204 
CTG GGC TCC GTG AGT TGC TAT GTT GCC GGC ATT GAA CTC TTA CAT CAG AAA GTA GAG CTG 675 

PKDVSGEFGWSFCLACVSAP 224 
CCC AAG GAT GTA TCT GGA GAA TTT GGA TGG TCC TTC TGC CTG GCC TGC GTC TCG GCT CCC 73 5 

LQFMAAALFIWAAHTNRKEY 244 
TTA CAG TTC ATG GCG GCC GCT CTC TTC ATC TGG GCT CCC CAC ACC AAC CGG AAA GAG TAC 7 95 

TLMKAYRVA* 254 
ACC TTA ATG AAG GCT TAT CGT GTG GCA TGA 825 

ACCGACGCTGCCTGCTTAATGATTAATATTTTTCATACATTTTTTT 871 



F/6. ifc 
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HUMAN TANGO 215 

Input file tag215/ Output File tag215.pat Sequence length 2747 

ME LGCWTQLG 10 

TCCCCAGTAGACGCTCCGGCACCAGCCGCGGCAAGG ATG GAG CTG GGT TGC TGG ACG CAG TTG GGG 66 

L TFLOLLLISSLPREYTVIN 30 

CTC ACT TTT CTT CAG CTC CTT CTC ATC TCG TCC TTG CCA AGA GAG TAC ACA GTC ATT AAT 126 

EACPGAEWNIMCRECCEYDQ 50 

GAA GCC TGC CCT GGA GCA GAG TGG AAT ATC ATG TGT CGG GAG TGC TGT GAA TAT GAT CAG 186 

IECVCPGKREVVGYTIPCCR '70 

ATT GAG TGC GTC TGC CCC GGA AAG ACG GAA GTC GTG GGT TAT ACC ATC CCT TGC TGC AGG 24 6 

NEENECDSCLI HPGCTI FEN 90 

AAT GAG GAG AAT GAG TGT GAC TCC TGC CTG ATC CAC CCA GGT TGT ACC ATC TTT GAA AAC 306 

CKSCRNGSWGGTLDDFYVKG 110 

TGC AAG AGC TGC CGA AAT GGC TCA TGG GGG GGT ACC TTG GAT GAC TTC TAT GTG AAG GGG 366 

FYCAECRAGWYGGDCMRCGQ 130 

TTC TAC TGT GCA GAG TGC CGA GCA GGC TGG TAC GGA GGA GAC TGC ATG CGA TGT GGC CAG 426 

VLRAPKGQILLESYPLNAHC ISO 

GTT CTG CGA GCC CCA AAG GGT CAG ATT TTG TTG GAA AGC TAT CCC CTA AAT GCT CAC TGT 486 

EWTI HAKPGFVIQLRFVMLS 170 

GAA TGG ACC ATT CAT CCT AAA CCT GGG TTT GTC ATC CAA CTA AGA TTT GTC ATG TTG AGC 54 6 

LEFDYMCQYDYVEVRDGDNR 190 

CTG GAG TTT GAC TAC ATG TGC CAG TAT GAC TAT GTT GAG GTT CGT GAT GGA GAC AAC CGC 606 

DGQ I I KRVCGNERPA P I Q S I 210 

GAT GGC CAG ATC ATC AAG CGT GTC TGT GGC AAC GAG CGG CCA GCT CCT ATC CAG AGC ATA 666 

GSS LHVL FHSD 'GSKN FDGFH 230 

GGA TCC TCA CTC CAC GTC CTC TTC CAC TCC GAT GGC TCC AAG AAT TTT GAC GGT TTC CAT 726 

AIYEE ITACSSSPCFHDGTC 250 

GCC ATT TAT GAG GAG ATC ACA CCA TGC TCC TCA TCC CCT TGT TTC CAT GAC GGC ACG TGC 786 

VLDKAGS YK CACLAGYTGQR 270 

GTC CTT GAC AAC GCT GGA TCT TAC AAG TGT GCC TGC TTG CCA CGC TAT ACT GGG CAG CGC 846 

CENLLEERNCSDPGGPINGY 290 

TGT GAA AAT CTC CTT GAA GAA AGA AAC TCC TCA GAC CCT GGG GGC CCC ATC AAT GGG TAC 906 

QKITGGPGLINGRHAKIGTV 310 

CAG AAA ATA ACA CGG GCC CCT GGG CTT ATC AAC CGA CGC CAT GCT AAA ATT GGC ACC GTT 966 

VSFFCYNSYVLSGNEKRTCQ 330 
GTG TCT TTC TTT TGT TAC AAC TCC TAT CTT CTT AGT CGC AAT GAG AAA AGA ACT TGC CAG 1026 

QNGEHSGKQPICIKACREPK 350 
CAG AAT GGA GAG TGG TCA GGG AAA CAG CCC ATC TGC ATA AAA GCC TGC CGA GAA CCA AAG 1086 

I SDLVRRRVLPMQVQSRETP 3 70 
ATT TCA CAC CTG GTG AGA AGG AGA GTT CTT CCG ATG CAG CTT CAG TCA ACG GAG ACA CCA 114 6 
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LHQLYSAAFSKQKLQSAPTK 390 
TTA CAC CAG CTA TAC TCA GCG CCC TTC AGC AAG CAG AAA CTG CAG AGT GCC CCT ACC AAG 1206 

KPALPFGDLPMGY QHLHTQL 410 
AAG CCA GCC CTT CCC TTT GGA GAT CTG CCC ATG GGA TAC CAA CAT CTG CAT ACC CAG CTC 1266 

QYECISPFYRRLGSSRRTCL 430 
CAG TAT GAG TGC ATC TCA CCC TTC TAC CGC CGC CTG GGC AGC AGC AGG AGG ACA TGT CTG 1326 

RTGKWSGRAPSCIPICGKIE 450 
AGG ACT GGG AAG TGG AGT GGG CGG GCA CCA TCC TGC ATC CCT ATC TGC GGG AAA ATT GAG 1386 

NITAPKTQGLRWPWQAAIYR '470 
AAC ATC ACT GCT CCA AAG ACC CAA GGG TTG CGC TGG CCG TGG CAG GCA GCC ATC TAC AGG 1446 

R T S G V H - D G S LH KG AW FLVCS 490 
AGG ACC AGC GGG GTG CAT GAC GGC AGC CTA CAC AAG GGA GCG TGG TTC CTA GTC TGC AGC 1506 

GALVNERTVVVAAHCVTDLG 510 
GGT GCC CTG GTG AAT GAG CGC ACT GTG GTG GTG GCT GCC CAC TGT GTT ACT GAC CTG GGG 1566 

KVTM I KTADLKVVLGKFYRD 530 
AAG GTC ACC ATG ATC AAG ACA GCA GAC CTG AAA GTT GTT TTG GGG AAA TTC TAC CGG GAT 1626 

DDRDEKTIQSLQISAI ILHP 550 
GAT GAC CGG GAT GAG AAG ACC ATC CAG AGC CTA CAG ATT TCT GCT ATC ATT CTG CAT CCC 1686 

N YD P I LLDAD'I AI L KLLDKA 570 
AAC TAT GAC CCC ATC CTG CTT GAT CCT GAC ATC GCC ATC CTG AAG CTC CTA GAC AAG GCC 1746 

RISTRVQPICLAASROLSTS 590 
CGT ATC AGC ACC CGA GTC CAG CCC ATC TGC CTC GCT GCC AGT CGG GAT CTC AGC ACT TCC 1806 

FQESHITVAGWNVLADVRSP 610 
TTC CAG GAG TCC CAC ATC ACT GTG GCT GGC TGG AAT GTC CTG GCA GAC GTG AGG AGC CCT 1866 

GFKNDTLRSGVVSVVDSLLC 630 
GGC TTC AAG AAC GAC ACA CTG CCC TCT GGG GTG GTC AGT GTG GTG GAC TCG CTG CTG TGT 1926 

EEQH EDHGI PVSVTDNMFCA 650 
GAG GAG CAG CAT GAG GAC CAT GCC ATC CCA GTG AGT GTC ACT GAT AAC ATG TTC TGT GCC 1986 

SWEPTAPSDICTAETGGIAA 670 
AGC TGG GAA CCC ACT GCC CCT TCT GAT ATC TGC ACT GCA GAG ACA GGA GGC ATC GCG GCT 2046 

VSFPCRASPEPRWHLMGLVS 690 
GTG TCC TTC CCG CGA CGA GCA TCT CCT GAG CCA CGC TGG CAT CTG ATG GGA CTG GTC AGC 2106 

WSYDKTCSHRLSTAFTKVLP 710 
TCG AGC TAT GAT AAA ACA TGC AGC CAC AGG CTC TCC ACT GCC TTC ACC AAG GTG CTG CCT 2166 

FKDWIERNMK* 721 
TTT AAA GAC TGG ATT GAA ACA AAT ATG AAA TGA 2199 

ACCATGCTCATGCACTCCTTG AC AAGTGTTTCTGTATATCCGTCTGTACCTGTGTCATTGCGTG AANCACTGTCGCCCT 2278 

GAAGTGTGATTTGGCCTGTGAACTTCGCTGTGCCAGCGCTTCTGACTTCAGCGACAAAACTCAGTG AAGGGTGAGTAGA 23 57 

CCTCCATTGCTGGTACGCTGATCCCVCGTCCACTACTAGGACAGCCAATTCCAACATGCCACGCCTTGCAAGAAGTAAG 24 36 
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TTTCTTCAAAGAAGACCATATACAAAACCTCTC 2515 
GCCATaVGCTTGACCAGGGAAGATCTGGGCTTCATGAGGCCCCTTTTGAGGCTCTCAAGTTCT 2594 
G ACAGCCCAGGG CAGCAGAGCTGGGATGTGGTG CATG CCTTTGTGTACATGG CCACAGTACAG TCTGGTCCTTTTCCTT 2673 
CCCCATCTCTTGTACACATTTTAATAAAAT^ 2747 
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GTCGACCCACGCGTCCGGCGGCTAGGCCCCCGTGCGCTGGAGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGC 79 

MGG PRGAGWVAA 12 
C CGGCGC7GCGG CTCTGCCGCGG CG G CAG C ATG GGT GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG 145 

GLLLGAGACYCI YRLTRGRR 32 
GGC C7G CTG CTC GGC GCG GGC GCC TGC TAC TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG 205 

RGDRELG IRSSKSAGALEEG 52 
CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG 265 

TSEGQLCGRSARPQTGGTWE 72 
ACG TCA GAG GGT CAG TTG TGC GGG CGC TCG GCC CGG CCT CAG ACG GGA GGT ACC TGG GAG 325 

SQWS KTSQPEDLTDGSYDDV 92 
TCA CAG TGG TCC AAG ACC TCG CAG CCT GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT 38S 

LNAEQLQKLLYLLESTEDPV 112 
CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA 4 45 

I IERAL I TLGNNAAFSVNQA 132 
ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT 505 

I IRELGG IPIVANKINHSNQ 152 
ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG 565 

SIKEKALNALNNLSVNVENQ 172 
AGT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA $25 

r KIKI YI SQVCEDVFSGPLN 192 
ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT GTC TTC TCT GGT CCT CTG AAC 685 

SAVQ LAGLTLLTNMTVTNDH 212 
TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC CAC 74 5 

QHMLHSY ITDLFQVLLTGNG 232 
CAG CAC ATG CTT CAC AGT TAC ATT ACA GAC CTC TTC CAG GTG TTA CTT ACT GGA AAT GGA 805 

NTKVQVLKLLLNLSENPAMT 252 
AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG TCT GAA AAT CCA GCC ATG ACA 865 

EGLLRAQVDSSFLSLYDSHV 272 
GAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TCC CTT TAT GAC AGC CAC GTA 925 

AKEI LLRVLTLFQNI KNCLK 292 
GCA AAG GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAC AAT ATA AAG AAC TGC CTC AAA 985 

I ECHLAVQPTFTEGSLFFLL 312 
ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA 104 5 

MGEECAQ K I RALVDHHOAEV 332 
CAT GGA GAA GAA TGT CCC CAG AAA ATA AGA GCT TTA GTT CAT CAC CAT CAT GCA GAG GTG 1105 

KEXVVTtlPKI* 344 
AAC GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1141 

TTCCTCATA7TTTTCCAAACACT A/\TGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGCGGATTCTCCCAG 1220 
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CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACA<^ 1299 
ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTCAAA 13 73 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTAC 1457 
ACTCATCTGAGACAGCATCAGTATTTGACrAAATCATTGTTTCACAACT 1 53 6 

ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1615 
TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAAT^ 1694 
TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1773 
CCGTGCTGGGCGCGGTGGCTC7TGCCTGTAATCCCAGCACTTTGGG AGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 1852 
GTTTGAGACCAAGCCTGACCAATATCGAGAAACCCTGTCTCTACTAAGAATACA^ 1931 
G CCTGT AATCCCAGCTACTTGGG AGG C CGAGGCAGGAG AATTGCTTG AACCCGGGAGG CAGAGGTTGCAGTG AGGTGAG 2010 
ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCWlAAAAAAAA^ 2089 
TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2168 
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2247 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2326 
AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAAAAGGGCCGCCGC 2403 
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TCCGGTCCANGAAAAAGCTGCTTGCACTAGGGGCATCCCGCCT 79 

GGCTTCCGATTTTAGCAGGGCGGCTTCCGGAAGGCGGAGCTCCAACCCCATTTCCTTTCTCTGGGCTGGTTCT 158 

M G G A R 5 

GCTGCACCTCCGTGTGGCCCTGGCTCCTCGGCTCCCTGCAGCTCCGAGGCAGCAGC ATG GGT GGC GCG CGG 229 

DVGWVAAGLVLGAGACYCIY 25 

GAC GTG GGC TGG GTG GCA GCA GGG CTC GTC CTG GGC GCC GGC GCC TGC TAC TGT ATC TAC 289 

R L T R G PRRGVATMRPSRSAE 45 

CGG CTG ACT CGG GGA CCG CGG CGA GGC GTC GCG ACC ATG CGC CCT TCG CGA TCC GCA GAA 349 

DLT.DGSYDDI LNAEQLKKLL 65 

GAC CTA ACC GAT GGC TCC TAT GAC GAT ATC TTA AAT GCA GAG CAG CTT AAG AAA CTT CTG 40 9 

YLLESTDDPVITEK A L V T L G 85 

TAT CTG CTG GAG TCA ACC GAC GAT CCT GTC ATT ACT GAA AAG GCC TTG GTC ACC TTG GGA 469 

MMAAFSTNQAI IRELGGIPI 105 

AAT AAT GCA GCC TTC TCC ACT AAC CAG GCC ATT ATT CGT GAG TTG GGT GGT ATC CCA ATT 529 

VGN K I NSLNQS I KEKALNAL 125 

GTT GGA AAC AAA ATC AAC TCC CTG AAC CAA AGT ATT AAA GAG AAA GCT TTA AAT GCA CTG 589 

N NLSVNVEMQTKI K I Y V P Q V 145 

AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ACT AAG ATA AAG ATA TAC GTC CCT CAA GTC 64 9 



C E D V FAD 
TGT GAG GAC GTC TTT GCT GAC 



152 
670 
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10 20 30 40 50 

HONUW MALLSRPALT L L L LLMAA WRCQ EQAQTT DWRAT L KT I RNG VHK I DT YLNAALDLL 

MuftlMe M-VTPP.PAPAKGPALLLLLLLATARGQEQDQTTDWRATLKTIRNGIKKIDTYLNAALDLL 
10 20 30 40 50 

60 70 80 90 100 110 

GGEDGLCQYKCSDGSKPFPRYGYKPSPPNGCGSPLFGVHLNIGIPSLTKCCNQHDRCYET 

GGEDGLCQYKCSDGSKPVPRYGYKPSPPNGCGSPLFGVHLNIGIPSLTKCCNQHDRCYET 
60 70 80 90 100 110 

120 130 140 150 160 170 

CGKSKNDCDEEFQYCLSKICRDVQKTLGLTQHVQACETTVELLFDSVIHLGCKPYLDSQR 



CGKSKNDCDEEFQYCLSKICRDVQKTLGLSQNVQACETTVELLFDSVIHLGCKPYLDSQR 
120 130 140 150 160 170 

180 190 
: AACRCHYEEKTDL 



AACWCRYEEKTDL 
180 190 
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10 20 30 40 50 60 

H^AlPS ^QLGAVVAVASSFFCASLFSAVHKIEEGHIGVYYRGGALLTSTSGPGFHLMLPFITSYK 

HUM4i^ MAQLGAVVAVAS3FFCA5LF SAVHKIEEGHIGVYYRGGALLTSTSGPGFHLMLPFITSYK 
10 20 30 40 50 60 

70 80 90 100 110 120 

SVQTTLQTDEVX^ryPCGTSGGVMIYFDRIEVVTIFLVPNAVYDIVKJIYTADYDKAI.IFNKI 

SVQTTLQTDEVKMVPCGTSGGVMIYFDRIEWNFLVPNAVYDIVKNYTADYDKALIFNKI 
70 30 90 100 110 120 

130 140 150 160 170 130 

KKELNQFCSVHTLQEVYIELFDQIDENLKLALQQDLTSMAPGLVIQAVRVTKPNIPEAIR 

HKELNQFCSVHTLQEVYIELFDQIDENLKLALQQDLTSMAPGLVIQAVRVTKPNIPEAIR 
130 140 150 160 170 180 

190 200 210 220 230 240 

RNYELMESEKTKLLIAAQKQKWEKEAETERKKALIEAEKVAQVAEITYGQKVMEKETEK 

RNYELMES EKTKLL IAAQKQKWEKEAETERKKAL I EAEKVAQVAE IT YGQKVMEKETEK 
190 200 210 220 230 240 
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10 20 30 40 50 60 

HO%4AU MNMTQARVLVAAVVGLVAVLLYASIHKIEEGHLAVYYRGGALLTSPSGPGYHIMLPFITT 

MURIWG 



70 80 90 100 110 120 

FRSVQTTLQTDEVKNVPCGTSGGVMIYIDRIEWNMLAPYAVFDIVRNYTADYDKTLIFN 

KiW PCGTSGGVMI YIDRI EWNMLA PYAVFD I VRNYT AD YDKTL I FN 

10 20 30 40 

130 140 150 160 170 130 

KIHHELNQFCSAHTLQEVYIELFDQIDENLKQALQKDLNLMAPGLTIQAVRVTKPKIPEA 

KIHHELNQFCSAHTLQEVYIELFDQIDENLKQALQKDLNTMAPGLTIQAVRVTKPKIPEA 
50 60 70 80 90 100 

190 200 210 220 230 240 

IRRNFELMEAEKTKLLIAXQKQKWEKEAETERKKAVIEAEKIAQVAKIRFQQKVMEKET 

IRRNFELMEAEKTKLLIAAQKQKWEKEAETERKRAVIEAEKIAQVAKIRFQQKVMEKET 
110 120 130 140 150 160 

250 260 270 280 290 300 

EKRISSIEDAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKIYFG 

EKRISEIEDAAFLAREKAKADAEYYAAHKYATSNKHKLTPEYLELKKYQAIASNSKIYFG 
170 180 190 200 210 220 

310 320 330 340 

SNXPNMFVDSSCALKYSDIRTCRESSLPSKEALEPSGENVIQNKESTG- 



SNIPSMFVDSSCALKYSDGRTGREDSLPPEEAREPSGESPIQNKENAGN 
230 240 250 260 270 
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10 20 30 40 50 60 

Ml/fclKWs MKLLCL VA WGC LL VP PAQANKSS EDI RCKC IC P PYRNI SGHI YNQNVSQKDCNCLHWE 

4(JMAM MKLLSLVAWGCLL VP PAEAMKSS EDI RCKC ICP PYRNI SGHIYNQWSQKDCNCLH WE 

10 20 30 40 50 60 

70 80 90 100 110 120 

PMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVIYLSWGALLLYMAFLMLVDPLIRKPD 

PMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVIYLSWGALLLYMAFLMLVDPLIRKPD 
70 80 90 100 110 120 

130 140 150 160 170 180 

AYTEQLHNEEENEDARTMATAAASIGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDRHK 

AYTEQLHNEEENEDARSMAAAAASLGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDRHK 
130 140 150 160 170 180 



MLS 
MLS 



FIG, ZS 
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10 20 30 40 50 

MATLW-GGLLRLGSLLSLSCLALSVLLLAQLSDAAKNFEDVRCKCICPPYKENSGHIYNK 

MASLWCGNLLRLGSGLSMSCLALSVLLLAQLTGAAKNFEDVRCKCICPPYKENPGHIYNK 
10 20 30 40 50 60 

60 70 80 90 100 110 

NISQKDCDCLHVVEPMPVRGPDVEAYCLRCECKYEERSSVTIKVTIIIYLSILGLLLLYM 

NISQKDCDCLHVVEPMPVTIG PDVEA YCLRCECKYEERSSVTIKVTI 1 1 YLS ILGLLLLYM 
70 80 90 100 110 120 

120 130 140 ISO 160 170 

VYLTLVEPILKRRLFGHAQLIQSDDDIGDHQPFANAHDVLARSRSRANVLNKVEYAQQRW 

VYLTLVEPILKRRLFGHSQLLQSDDDVGDHQPFANAHDVIARSRSRANVLNKVEYAQQRW 
130 140 150 160 170 180 

180 190 

KLQVQEQRKSVFDRHWLS 



KLQVQEQRKSVFDRHWLS 
190 



HC/MAN 
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10 20 30 40 50 60 

HUM/AM MIRCGLACERCRWILPLLLLSAIAFDIIAIiAGRGWLQSSDHGQTSSLWWKCSQEGGGSGS 

M ^6 MLRCGLACERCRWILPLLLLSAIAFDIIALAGRGWLQSSNHIQTSSLWWRCFDEGGGSGS 
10 20 30 40 50 60 

70 80 90 100 110 120 

YEEGCQSLMEyAWGRAAAAMLFCGFIILVICFILSFFALCGPQMLVFLRVIGGLLALAAV 

YDDGCQSLMEYAWGRAAAATLFCGFIILCICFILSFFALCGPQMLVFLRVIGGLLALAAI 
70 80 90 100 110 120 

130 140 ISO 160 170 180 

FQIISLVIYPVKYTQTFTLHANPAVTYIYNWAYGFGWAATIILIGCAFFFCCLPNYEDDL 

FQIISLVIYPVKYTQTFRLHDNPAVNYIYNWAYCFGWAATIILIGCSFFFCCLPNYEDDL 
130 140 150 160 170 180 

190 

LGNAKPRYFYTSAN 



LGAAKPRYFYPPAN 
190 



Fid- 11 
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10 20 30 40 50 

MOVING MAG I PGL - FILLVLLCVFMQVSP YTVPWKPTWPA YRL P WLPQSTLNLAKADFDAXAKLE 

M(J K AN MAGIPGLLr LLFFLLCAVGQVSPV3APWKPTWPA7RLPWLPQSTLNLAKPDFGA2AKLE 

10 20 30 40 50 60 

60 70 80 90 100 110 

VSSSCG PQCHKGTPLPTYSEAKQYLSYETLYANGSRTETRVGIYILSNGEGRARGRDSEA 

VSSSCGPQCHKGTPLPTY2EAKQYLSYETLYANGSRTETQVGIYILSSSGDGAQKRDSGS 
70 80 90 100 110 120 

120 130 140 150 160 170 

TGRSRRKRQIYGYDGRFSIFGKDFLLNYPFSTSVKLSTGCTGTLVAEKHVLTAAHCIHDG 



SGKSRRKRQIYGYDSRFSIFGKDFLLNYPFSTSVKLSTGCTGTLVAEKHVLTAAHCIHDG 
130 140 150 160 170 180 

180 190 200 210 220 230 

KTYVKGTQKLRVGFLKPKYKDGAEGDNSSSSAMPDKMKFQWIRVKRTHVPKGWIKGNAND 

KTYVKGTQKLRVGFLKPKFKDGGRGANDSTSAMPEQMKFQWIRVKRTHVPKGWIKGNAND 
190 200 210 220 230 240 

240 250 260 270 280 290 

IGMDYDYALLELKKPHKRQFMKIGVSPPAKQLPGGRIHFSGYDNDRPGNLVYRFCDVKDE 

IGMDYDYALLELKKPHKRKFMKIGVSPPAKQLPGGRIHFSGYDNDRPGNLVYRFCDVKDE 
250 260 270 280 290 300 

300 310 320 330 340 350 

T YDLL YQQCDAQ PGASGSGVYVRMWKRPQQKWERK I IG I FSGHQWVDMNGS PQDFNVAVR 

TYDLLYQQCDAQPGASGSGVYVRMWKRQQQKWERKIIGIFSGHQWVDMNGSPQDFNVAVR 
310 320 330 340 350 360 

360 370 380 

ITPLKYAQICYWIKCMYLDCREG 



ITPLKYAQ ICYWI KG W Y I DC R EC 
370 380 
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10 20 30 40 50 

HUM A M MAP AS R L L AL WALAA VAL PG S G AEG DGGWR PGG ?G AVAEEERCTVERRADLT 



MURING MAAAGRRGLLLLr VX.WMMVTVILPAS GEGCWKQNGLGIAAAVMEEERCTVERRAHIT 

10 20 30 40 50 

60 70 80 90 100 110 

YAEFVQQVAFVRPVILQGLTDNSRFRALCSRDRLLASFGDRWRLSTAr^TYSYHKVDLPF 



YSEFMQHYAFLKPVILQGLTDMSKFRALCSRENLLA3FGDNIVRLSTAWTYSYQKVDLPF 
60 70 80 90 100 110 

120 130 140 150 160 170 

QEYVEQLLHPQDPTSLGNDTLYFFGDNNFTEIVASLFRHYSPPPFGLLGTAPAYSFGIAGA 



QEYVEQLLQPQDPASLGNDTLYFFGDNNFTEWASLFQHYSPPPFRLLGTTPAYSFGIAGA 
120 130 140 150 160 170 

130 190 200 210 220 230 

GSGVPFHWHGPGYSEVIYGRKRWFLYPPEKTPEFHPNKTTLAV^LRDTYPALPPSARPLEC 



GSGVPFHWHGPGFSEVIYGRKRWFLYPPEKTPEFHPNKTTLAWLLEIYPSLALSARPLEC 
190 190 200 210 220 230 

240 250 260 

TIRAGEVLYFPDRWWHATLNLDTSVFISTFLG 



TIQACEVLYFPDRWWHATLNLDTSVFISTFLG 
240 250 260 



FIG. 2? 
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10 20 30 40 50 60 

HOM^W MDNRFATAFVIACVLSLISTIY>LAJ\SIGTDFVrf EYHSPVQENSSDLNKSIWDEFISDEAD 



/MU^iM^ MDNRFATAFVIACVLSLI5TIYi4AASIGTDF T /VVEV?.S?rQENSSDSNKIAWEDFLGOEAD 
10 20 30 40 50 60 

70 80 90 100 110 120 

ECTyNDALFRYNGTVGLWRRCrTIPKNMHOTSPPERTESFDV-VTKCVSFTLTEQFMEKFV 



EKTYNDVLFRYNGSLGLWRRCITIPKNTHOTAPPERTESFDVVTKCMSFTLNEQFMEXYV 
70 SO 90 100 110 120 

130 140 150 160 170 180 

DPGNHNSGIDLLRTYLWRCOFLLPFVSLGLHCFGALIGLCACICRSLYPTIATGILHLLA 



D PGNHNSGIDLLRTYLWRCQFLLPr VSLGLMCFGALIGLCACICRSLYPTLATGILHLLA 
130 140 150 160 170 180 

190 200 210 220 230 240 

GLCTLGSVSCYVAGIELLHQKLELPDNVSGEFGWSFCLACVSAPLOFMASALFIWAAHTN 



GLCTLGSVSCYVAGIELLHQKVELPKDVSGEFGWS FCLACVSAPLQFMAAALFIWAAHTN 
190 200 210 220 230 240 

250 

RKEYTLMKAYRVA 



RKEYTLMKAYRVA 
250 



WO 00/18904 



46/112 



PCT/US99/22817 



10 20 30 40 50 

Mu£.lW€ MGGARDVGWVAAGLVLGAG ACYC I YRLTRG PRRGVATM - - RPSRSAEDLTDGSYDDILNA 

HUKAM mgGPRGAGW^/AAGLLLGAGACYCIYRLTRGRRRGDRELGIRSSKSAEDLTDGSYDDVLNA 
10 20 30 40 50 60 

60 70 80 90 100 110 

EOLKKLL YLLESTDDPV ITEKALVTLGNNAAFSTNQAI IRELGGI ? I VGNKIMSLNQSIK 

EQLQKLLYLLESTEDPVIIERALITLGNNAAFSVNQAIIRELGGIPIVANKINHSNQSIK 
70 80 90 100 110 120 

120 130 140 150 
EKALNALNNLS VNVENQT K I K I YVPQ VC EDVF A 



EKALNALNNLSVNVENQIKIKIYISQVCSDVFSGPLNSAVQLAGLTLLTNMTVTNDHQKM 
130 140 150 160 170 180 



LHSYITDLFQWLTGNGNTKVQXLKLLLNLAENPAMTEGLLRAOVDSSFLFLYDXHVAXE 
190 200 210 220 230 240 



D 

XLLQYLRFSE 
250 
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humutntalign 

ALIGN calculates a global alignment of two sequences 

version 2.0u?lease cite: Myers and Miller, CA3I0S (1989) 

> mutl80 157Q aa vs . > hutl80 

1203 aa scoring matrix: pam!20 .mat, gap penalties: -12/-4 
55.0% identity; Global alignment score: 2219 

10 20 30 40 50 
GTCGACCCACGCGTCCG- - - GGCCGGGGTCCTG A GCCGGAGCCGGAGCGCGCGCC 

GTCGACCC ACGCGTCCG CG TGG ATATGG AG CTGGCTGCTG CCAAGTCC GGGGCCCG CGCC 
10 20 30 40 50 60 

60 70 80 90 

GCTGCCCAGC CC CGC CGCGCCG -GCCCCGCAG AT -GGTGACT 



GCTGCCTAGCGCGTCCTGGGGACTCTGTGGGGACGCGCCCCGCGCCGCGGCTCGGGGACC 
70 80 90 100 110 120 

100 110 120 130 

C CGCGGCCCGC - - - GCCC - GCCCGGG - GCCCCGCGCTC - - - CTCCTCCT 



CGTAGAGCCCGGCGCTGCGCGCATGGCCCTGCTCTCGCGCCCCGCGCTCACCCTCCTGCT 
130 140 150 160 170 180 

140 150 160 170 180 190 

CCTGCTGCTCGCCACTCCGCGCGGG- - -CAGCAACACGACCAGACCACCGACTGGAGGGC 



CCTCCTCATGGCCGCTGTTGTCAGGTGCCAGGAGCAGGCCCAGACCACCGACTGGAGAGC 
190 200 210 220 230 240 

200 210 220 230 240 250 

CACCCTCAAGACCATCCCCAACGGCATCCACAAGATAGACACGTACCTCAACGCCGCGCT 



CACCCTGAAGACCATCCGGAACGGCGTTCATAAGATAGACACGTACCTGAACGCCGCCTT 
250 260 270 280 290 300 

260 270 280 290 300 310 

GGACCTGCTGGGCGGGGACGACGGGCTCTGCCAGTACAAGTGCAGCGACGGATCGAAGCC 



GGACCTCCTGGGACCCGACGACGGTCTCTGCCAGTATAAATGCAGTGACGGATCTAAGCC 
310 320 330 340 350 360 

320 330 340 350 360 370 

TGTTCCACGCTATGGATATAAACCATCTCCACCAAATGGCTGTGGC7CTCCACTGTTTGG 



TTTCCCACG7TATGGTTATAAACCCTCCCCACCGAATGGATGTCGCTCTCCACTGTTTCG 
370 380 390 400 410 420 

380 390 400 410 420 430 

CGTTCATCTGAACATAGGTATCCCTTCCCTCACCAAGTGCTGCAACCACCACCACAGATG 

rCTTCATCTTAACATTGGTATCCCTTCCCTGACAAAGTGTTCCAACCAACACGACAGGTG 
430 440 450 460 470 480 

440 450 460 470 430 490 

ctatgagacctgcccgaaaagcaagaacgactgtgacgacgag7tccagtactccctctc 
:tatgagacctctcgca,\aagca,\gaatcactgtgatgaagaattccagtattgcctctc 

•I'U 500 5L0 520 530 540 



F/6. 3 2. C(of3) 
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500 510 520 530 540 550 

CAAGATCTGCAGAGACGTGCAGAAGACGCTCGGACTATCTCAGAACGTCCAGGCATGTGA 

CAAGATCTGCCGAGATGTACAGAAAACACTAGGACTAACTCAGCATGTTCAGGCATGTGA 
550 560 570 580 590 600 

560 570 580 590 600 610 

GACAACGGTGGAGCTCCTCTTTGACAGCGTCATCCATTTAGGCTGCAAGCCATACCTGGA 

AACAACAGTGGAGC7CTTGTTTGACAGTGTTATACATTTAGGTTGTAAACCATATCTGGA 
610 620 630 640 650 660 

620 630 640 650 660 

CAGCCAGCGGG CTGCATGCTGGTG TCGTTATGAAGAAAAAACAGATCTATAAAG ACC 

CAGCCAACGAGCCGCATGCAGGTGTCATTATGAAGAAAAAACTGATCTTTAAAGGAGATG 
670 680 690 700 710 720 

670 680 690 700 710 720 

CTGACTGCTGGAGAGCAGGCGAGAATGGAGGATCAT - CCTT - GCCAAAGATCGGATGCTT 

CCG ACAGCTAGTGA - CAGATGAAG ATGGAAGAACATACCTTTGACAAATAACTAATGTTT 
730 740 750 760 770 

730 740 750 760 770 780 

TAACAGCCTAATGTTGCCTTAGTTTTGTGTCGATGGGTCATTTTGAGACCTTTCTATACT 

TTACAACATAAAACTGTCTTATTTTTGTG- -AAAGGATTATTTTGAGACCTTAAAATA- - 
790 790 800 810 820 830 

790 800 810 820 830 840 

GTGTCTTTTTTTAGAACCTCAAAGTGAAAACGGTGGGGGGCCAGGCAGAAACAGAGGGAG 

ATTTATAT CTTGATGTTAAAACCT - - CAAAGCAAAAAAAGTGAGGG 

840 850 860 870 

850 860 870 880 890 900 

AGCATGCTTGGGATGGCGAGCGAGCAGGACATCCAAGAGCATGCCTTCCTGAGACTCGCT 



AGATAG TG ACGGGAGGGCA - - - C GCTTGTCTTC 

880 890 900 

910 920 930 940 950 960 

GTCTTGGTGGCTCCCCCAAACTGGGAAGAAAAGCf TAAGCTCGTGTGACTTGGTGTTCAT 

- TCA - GGTATCTTCCCCA GC ATT - GCTC CCTT A CTT 

910 920 930 940 

970 980 990 1000 1010 1020 

AGTTGTACTTAACAATAAAAATGAAAGCAAATGTAAAATTCATTGTAAGGACTTTTCAGC 



AGTA - TGC CAAATGT CTT 

950 

1030 10-iO 1050 1060 1070 1080 

ATTATTTTATTTTCAAATACACGCCAATCTTCCCTTAGAACTATTATTTATTTTGAAATT 

GACCAAT-ATC- - - AAAAACAAGTGCTTGTTTAG 

060 970 990 
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1090 1100 1110 1120 1130 1140 

TCAGATGTACATTTATACCTGGAAAAACTATTAATTCTCCATTTTTATTATACATAATGT 



- CGGA - GAATTTTGAAAAGAGGAATA TATAACTCAATTTT - 

990 1000 1010 1020 

1150 1150 1170 1130 1190 1200 

G TTGTTTCTCTG AAG CCC ACTAAG ATAGG TATAAATATGTT ACTCAAAACTACACGGTTT 

CAC AAC - - CACATTTA 

1030 1040 

1210 1220 1230 1240 1250 1260 

CCAAATGTGCATCTCTTGTACAGTTGGAATCACGGTTGGTACTTCTCTGGAGAGACGCCC 



CCAAA AAAAGAGATCAAATATAAAATT 

1050 1060 

1270 1280 1290 1300 1310 1320 

CAGG ACATCTGAGTG TTGGGATGTG CACAGAATTCAGAAG CCCAGCTTCCTGT CTCACAA 



CATCATAATGT CTGTT - - - CAACAT - - TATCT 

1070 1080 1090 

1330 1340 1350 1360 1370 1380 

ACCGCTTAGAGTGAATGTCCTTCCTCTCCTGCTGTGAGCTCTAGGAATGACGGGTTTAAC 



TATTTG G AAAATGGGGAAATTATC 

1100 1110 

1390 1400 1410 1420 1430 1440 

GGGCCAAGCCGAGCTCTGAATCAGTGCGCTATCTGCTGCTGAGGTTGTGGTTACTCCCTC 



A CTTACA AGTATTTGTTTACT 

1120 1130 1140 

1450 1460 1470 1480 1490 1500 

ATCCCCGTTTTCCATCTTCTATCCTGGAGTAGTGTTAAAACTCTGACATTTTCTAATGGA 



ATGAAAT - TTTAAATAC - - ACATTT 

L ISO 1160 

1510 1520 1530 1540 1550 1560 

uCTCTTAATAAAAGCTATTTACTTCTTGGTAAAAAAAAAAAAAAAAAAAAAAAAAAGGGC 



ATGC CTAG AAAAAAAAAAAAAAAAAAAAAAAGCGC 

L170 1180 1190 

1570 
uGCCG * 

ZGCCGC 
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10 20 30 

TANGGATCGACCACGCGTYCGCCCACGCGT 

ACGCGTCCGCGGACGCGTGGGCGCGGACTGATGGCGTCATCGAAGCGACTGGCCCGGAAG 
10 20 30 40 50 60 

40 50 60 70 80 

CCGGTCGCGTGCTGAGGGGTGTGACGGTTT--TC- -TTGCTCGTGGGCTCGGACGAGTAC 

GAAGTAGGGTGCTGAGGGGTGTGGCGGTTTCTACGGTTGCACGGGGGTTCGGCTGTGTAC 
70 80 90 100 110 120 

90 100 110 120 130 140 

GGAGCGCCTCCAGGGACAGCCTGGATAAAGGCTCACTGATGGCTCAGTTGGGAGCAGTTG 

GGAGCGCCTGGAGGGACAGCCTGGATACAGGTTCACTGATGGCTCAGTTGGGAGCTGTTG 
130 140 150 160 170 180 

150 160 170 180 190 200 

TGGCTGTGGCTTCCAGTTTCTTTTGTGCATCTCTCTTCTCACCTGTGCACAAGATAGAAG 

TGGCCGTGGCTTCCAGTTTCTTTTGTGCATCTCTCTTCTCAGCTGTGCACAAGATAGAAG 
190 200 210 220 230 240 

210 220 230 240 250 260 

AGGGACATATTGGGGTATATTACAGACGCGGTGCCCTGCTGACTTCGACCACCGGCCCTG 

ACGGACATATTGGAGTATATTACAGAGGTGGTGCCCTGCTGACCTCCACCAGTCGCCCGG 
250 260 270 280 290 300 

- "0 280 290 300 310 320 

GTTTCCATCTCATGCTCCCTTTCATCACATCATATAAGTCTGTGCAGACCACACTCCAGA 

CTTTCCATCTCATGCTCCCCTTCATCACATCCTATAAGTCTCTACACACCACTCTCCAAA 
3 10 320 J 30 340 350 360 

330 340 350 3o0 370 3:90 

CAGATGAGGTC/\AGA.\TGTACCTTGTGGGACTAGTGGTGGTGTGATGATCTACTTTCACA 



f(6 S3 O" t 0 
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CTGATGAAGTGAAGAACGTACCATGTGGAACCAGTGGTGGTGTGATGATCTACTTTGACA 
370 380 390 400 410 420 

390 400 410 420 430 440 

GAATTGAAGTGGTGAACTTCCTGGTCCCGAACGCAGTGTATGATATAGTGAAGAACTATA 

GAATTGAAGTGGTGAACTTCCTGGTCCCAAATGCAGTGTATGATATAGTGAAGAACTATA 
430 440 450 460 470 480 

450 460 470 480 490 500 

CTGCTGACTATGACAAGGCCCTCATCTTCAACAAGATCCACCACGAACTGAACCAGTTCT 

CTGCAGACTATGACAAGGCCCTCATCTTCAACAAGATCCATCATGAGCTTAACCAGTTCT 
490 500 510 520 530 540 

510 520 530 540 550 560 

GCAGTGTGCACACGCTTCAAGAGGTCTACATTGAGCTGTTTGATCAGATTGATGAAAATC 



GCAGCGTTCATACTCTTCAGGAAGTCTATATCGAGCTGTTTGATCAAATTGATGAAAACC 
550 560 570 580 590 600 

570 580 590 600 610 620 

TCAAACTGGCTTTGCAACAGGACCTGACCTCCATGGCCCCTGGGCTGGTCATTCAAGCTG 



TCAAGTTGGCTTTGCAGCAGGACCTGACTTCCATGGCCCCTGGGCTGGTTATCCAAGCTG 
610 620 630 640 650 660 

630 640 650 660 670 680 

TGCGGGTAACAAAGCCCAACATACCAGAGGCAATCCGCAGAAACTACGAGTTGATGGAAA 

TGCGAGTGACAAAGCCCAATATACCTGAGGCAATCCGCAGGAACTATGAGCTGATGGAAA 
670 680 690 700 710 720 

690 700 710 720 730 740 

GTGAGAAGACAAAGCTTCTCATTCCCGCCCAGAAACAGAAGGTGGTGGAAAAGGAAGCAG 

GCGAGAAGACGAAGCTTCTCATTGCAGCCCAGAAGCACAAGGTGGTGGAAAAGGAGCCAG 
730 740 750 ■ 760 770 780 

750 760 770 780 790 8Q0 

AGACAGAGCGCAAGAAGGCGCTCATTGAGGCAGAAAAAGTGGCCCAGGTGGCTGAGATCA 

AAACACAGAGGAAGAAGCCCCTCATTGAGGCAGAAAAAGTGGCACAGGTTGCAGAAATCA 
790 800 810 820 830 840 

910 820 830 840 850 360 

CCTACGCGCAGAAGGTGATGCAGAACGAGACTCAGAAGAAGATTTCACAAATTCAAGATC 



CCTATGGCCAAAACGTGATGGACAACCAGACAG AGAAGAATGTGAAAAGATCTG7AG - TC 
850 860 870 880 890 900 

870 880 890 900 9L0 920 

CTCCATTT -CTCCCCCGCCAGAACCCAAAGGCAGATCCTGAGTCCTACACTC - - CTATGA 

CTCACTTAACACTT - -TGACAAGAGCCTAACCATGCCCTTCACGCAACACCTACCTCTCG 
910 920 9J0 940 950 960 
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930 940 950 960 970 980 

AAATAGCCGAAGCCAATAAGCTGAAGCTAACCCCTGAATATCTGCAGCTGATGAAGTACA 

GAGAAGGAGGAGGCA GCCATTTCTAACTC GTTTCTATAGAAGCCCTGGGTAG 

970 980 990 1000 1010 

990 1000 1010 1020 1030 1040 

AGGCCATTGCTTCCAACAGCAAGATTTACTTTGGCAAAGACA - TTCCTAACATGTTCATG 

ATGCCTCAGCA- -CGGTCCCTTTTCATGCTTTGATTGACACTCAACCT- -CGGGAGGAAA 
1020 1030 1040 1050 1060 1070 

1050 1060 1070 1080 1090 1100 

GACTCTGCGGGCAGTGTGAGCAAGCAGTTTGAGGGGCTAGCTGACAAGCTAAGCTTTGGC 
: : : : : : . . : : : : : . : : . . : : : . : : . : 

CCCTCTGCA- -C GTGACCTGTCAATATG- -GTGCTAAATGT- -GTCTATG GAC 

1080 1090 1100 1110 1120 

1110 1120 1130 1140 1150 

TTAGAAGATGAAC -CCTTGGAGA-CGGCC ACTAAGGAGAATTGAAAAAAACTTGAT 

CCTGCTCTCCGTCTCCAGGCAGTTCTACCGTATACTTGGACCCTTGGGTTATAGCTAGCC 
1130 1140 1150 1160 1170 1180 

1160 1170 1180 1190 1200 1210 

ATG ACTGCAAATGATACT - TAAGC AG ATCTTTATTTTTT AAGATGAATC AGAATGTTCCT 

ACTGCTGGTGTTTATGTGAACATTCCTATAAATTC - AATTTCCCTCTGGA-GTTCCA 

1190 1200 1210 1220 1230 

1220 1230 1240 1250 1260 1270 

CCCTCCCCGACTACCTTCTCTGACTGTCTTCCAGTTACTGTGGTGAAAAAGAAGAAATGA 

CGCTACGC - -CTG- -TGC-CAGGCAAAC - -CCTGTCCCTA- -GAACATAGCCTGGACGTC 
1240 1250 1260 1270 1280 

1280 1290 1300 1310 1320 1330 

ACTTAAATCCACTCCCTTTCTACGGAAAGGAGGGTCGGGACTGATCATGGGGGGTTTTAT 

ACACCTACTCTGTACATTTCT - - -GCTTCGTTCATTCC - TCTGTAGTTCCACCGCTTAGA 
1290 1300 1310 1320 1330 1340 

1340 1350 1360 1370 1380 1390 

TTCAGGTAAGCAGTTTATATGACTTCCAATAAGATTTGTAAATCATGGGCTTGACCTTTG 

T- -CGAGAAACAAGAGTCTAACCTTCTCATCCTCCCAGTTT-TC -TGCATTAGAC-TTCC 
1350 1360 LJ70 1380 1390 

1400 1410 U:0 1430 L440 1450 

ACCTCTACACACTAATTTTATCCTTTGA -GGCTCGCTTAATTAC - -GGATCCTGTCAT - T 

A - - TCAATA7TCTTCTAA - ATCCTCTGACAAATGATCTAATTAGAAGAAATCACACCTCT 
1400 1410 1420 1430 1440 1450 

1460 1470 1480 1490 1500 1510 

AAGGAGAGGGAGAAATGTAGAGTGTTACCTCCAACTCATTTGATTTCCCTTACTTGGGAA 



FI6 1}(3opH) 
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TTCCTGTGTGCATTGCTGGGACAAATGCCTC CATTAGAAA ATTC AAAGAAA 

1460 1470 1480 1490 1500 

1520 1530 1540 1550 1560 

AATGCAGTCCAGTGTTCTCACCTCTG--CCTCCAAGGTAGGAGATGTCTGTGGGTGAGGC 

GTCATAATCGAGAAT - CTCTTTGGTGGTCCTCTAAGGCGGGT - - TGTTTTTCAATGTTGT 
1510 1520 1530 1540 1550 1560 

1570 1580 1590 1600 1610 1620 

TYWKCAACTGAGCAAATATGTGCCTGTGAGTTTGCCAGTAGAGCTGTGAAGAAACAGCTG 

TG - TCTT - GG AGCTTGG AGGTG AAATTC AATGT TTAAAATTTTT AGG AAATTT ATA 

1570 1580 1590 1600 1610 

1630 1640 1650 1560 1670 1630 

CAGAGAA - C ATTTG ACCTTCCTGGC ATTCTTGTCTGCATGTGTGTGAGTTATTTTAG AGG 

C AAAG AAACTTTT AAATAAAGTAT ATTG AATGT - GCC ATGAAAAAAAAAAAAAAAAAAGG 
1620 1630 1640 1650 1660 1670 

J 

1690 1700 1710 1720 1730 1740 

TGTGCTTTCTTGAGCCCTCATAAGGAAGTACTGGTGCTAGGTTTTGCAAGATTTKGTATA 

CCGGCCC 
1680 
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10 20 30 

GTAAAAATGTGCCTTGTGGAACAAGTGGTGG 
X: : 
TGTGCAGACAACACTACAAACTGATGAAGTTAAAAATGTGCCTTGTGGAACAAGTGGTGG 
240 250 260 270 280 290 

40 50 60 70 80 90 

AGTCATGATCTATATTGACCGAATAGAAGTGGTTAATATGTTGGCTCCTTATGCAGTGTT 



GGTCATGATCTATATTGACCGAATAGA.\GTGGTTAATATGTTGGCTCCTTATGCAGTGTT 
300 310 320 330 340 350 

100 110 120 130 140 150 

TGACATTGTGAGGAACTATACTGCAGACTACGACAAGACTTTAATCTTCAATAAAATCCA 



TGATATCGTGAGGAACTATACTGCAGATTATGACAAGACCTTAATCTTCAATAAAATCCA 
360 370 380 390 400 410 

160 170 130 190 200 210 

CCATGAGCTGAACCAGTTTTGCAGTGCCCACACACTTCAAGAACTTTACATAGAATTGTT 



CCATGAGCTGAACCAGTTCTCCAGTGCCCACACACTTCAGGAAGTTTACATTGAATTGTT 
420 430 440 450 460 470 

220 230 240 250 260 270 

TGATCAAATAGATGAAAACCTGAAGCAGGCCCTGCAAAAAGATTTAAACACCATGGCCCC 



TGATC.^TAGATGAAAACCTGAACCAAGCTCTGCAGAAAGACTTAAACCTCATGCCCCC 
480 490 500 510 520 530 

280 290 300 310 320 330 

ACGTCTCACTATCCAGGCTGTGCGTGTTACAAAACCCA-\AATCCCAGAAGCCATAAGAAG 



ACGTCTCACTATACAGGCTGTGCGTGTTACAAAACCCAAAATCCCAGAAGCCATAAGAAG 
540 550 560 570 580 590 

340 350 360 370 330 390 

A-^ATTTTCAATTAATGGAGGCAGAGAAGACAAAACTTCTCATACCTGCACAGAAACAAAA 
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AAATTTTGAGTTAATGGAGGCTGAGAAGACAAAACTCCTTATAGCTGCACAGAAACAAAA 
600 610 620 630 640 650 

400 410 420 430 440 450 

GGTGGTGGAGAAAGAAGCTGAGACGGAGAGGAAAAGGGCTGTTATAGAAGCAGAGAAGAT 

GGTTGTGGAAAAAGAAGCTGAGACAGAGAGGAAAAAGGCAGTTATAGAAGCAGAGAAGAT 
660 670 680 690 700 710 

460 470 480 490 500 510 

TGCACAAGTAGCAAAAATTCGATTTCAACAGAAAGTGATGGAGAAAGAAACTGAAAAACG 



TGCACAAGTGGCAAAAATTCGGTTTCAGCAGAAAGTGATGGAAAAAGAAACTGAAAAGCG 
720 730 740 750 760 770 

520 530 540 550 560 570 

CATTTCTGAGATTGAAGATGCTGCGTTCCTGGCCCGAGAGAAGGCAAAAGCAGATGCCGA 



CATTTCTGAAATCGAAGATGCTGCATTCCTGGCCCGAGAGAAAGCGAAAGCAGATGCTGA 
730 790 800 810 820 330 

580 590 600 610 620 630 

GTATTACGCTGCACACAAATACGCCACCTCAAACAAGCACAAACTGACCCCAGAGTATCT 



ATATTATGCTGCACACAAATATGCCACCTCAAACAAGCACAAGTTGACCCCGGAATATCT 
840 850 860 870 880 890 

640 650 660 670 630 690 

GGAGCTCAAGAAATACCAGGCCATTGCCTCAAACAGTAAGATCTACTTTGGCAGCAACAT 



GGAGCTCAAAAAGTACCAGGCCATTGCTTCTAACAGTAAGATCTATTTTGGCAGCAACAT 
900 910 920 930 940 950 

700 710 720 730 740 750 

CCCCAGCATGTTTGTGGACTCCTCCTGTGCTCTGAAATACTCTGATGGTAGGACTGGGAG 



CCCTAACATGTTCGTGGACTCCTCATGTGCTTTGAAATATTCAGATATTAGGACTGGAAG 
960 970 980 990 1000 1010 

760 770 780 790 800 310 

AGAAGACTCCCTTCCCCCAGAGGAGGCCCGTGAGCCCTCTGGAGAGAGCCCCATCCAAAA 



AGAAAGCTCACTCCCCTCTAAGGAGGCTCTTGAACCCTCTGGAGAGAACGTCATCCAAAA 
1020 1030 1040 1050 1060 1070 

820 830 840 350 860 870 

CAAGGAGAACGCACGTTGATGCAAGAGCTGCAAATGTTCTCCCATATCAAGATCCGACCC 



C\AAGAGAGCACACGTTGATGCAAGAGGTGGAAATGTTCTCC - ATATCAAGA TCTGGCCC 
L«H0 1090 1100 1110 1120 1 1 JO 

3*0 390 900 910 920 930 

.^CGGCCTAAGTCGGAACACTGGTTATGTGGACTCGT/V\GATTCACAGAC.^TGTGTGCT 



,\AGGGGTT.AAGTGGGAACAATCATTATACGGACTCTTCAGATTTACAGAGA.ACTTACACT 
1 14 J 1150 1 160 U70 1180 1190 
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940 950 960 970 980 
CTGTTGTGATTCTCTTGTCATAGTCCTGGTTTGCCAGCTGACTACAGGATAGACCC 

TCATCTGTTCCACCTCTCCTGCGATAGTCCTGGGTGCTCCACTGATTGGAGGATAGAGCC 
1200 1210 1220 1230 1240 1250 

990 1000 1010 1020 1030 1040 

AGCTGTCTGGCACTCAAACGGTCTCTGCAGCCACAGTTTTATCAAGTATCCTGTATGTGT 



AGCTGTCTGACACACAAATGGTCTTTTCAGCCACAGTCTTATCAAGTATCCTATATGTAT 
1260 1270 1280 1290 1300 1310 

1050 1060 1070 1080 1090 1100 

TCCTTTGTAAACCGGTACTCATGAATGAGGGAAAGTCTGATGCTAAGATACTGCCTGCAC 

TCCTTTCTAAACTGCTACTCATG AATGAGG - AAAGTCTGATGCTAAG ATACTGCCTGCA - 
1320 1330 1340 1350 1360 1370 

1110 1120 1130 1140 1150 1160 

TGGAATGTCAAACACTATATAACAAGCTGTGGTTTTTAAAAGCTATTGAATAATGTTTAC 



1170 1180 1190 1200 1210 " 1220 

ATTGGTCCCTGAGGACATGTGTGCTCAGACATTCAAGAGCTACGAGGCCAGAGAGAAGAC 



• -TTCCCTG- 



1230 1240 1250 1260 1270 1280 

CTTCAGAAAACGGTAAGTTAAAGAAGACAAGTGTCATCAGACACTTGGGACCCGGGCTCT 

CATTGGGTT GATG AC TGTCAGC A TC A 

1380 1390 1400 

1290 1300 1310 1320 1330 1340 

CTTTAAAGTCTAGTCCCGGCATTCCTCCATGTGATTGACAGCCAGACCTCTGGGTTCCCA 



CTG - -CCG - CAGGCCA 

1410 

1350 1360 1370 1330 1390 1400 

GGAAATTATCTTCCAGTTGAATGACCATTTACTTGATACAAATTGTACCTTTCTGTTTTT 

TGCTTG - - ACTAAG - GTACCT 

1420 1430 

1410 1420 1430 1440 1450 1460 

CTAGTCAGGTTGCTGGCCTGCAGGGACGCGTACTTTCCCACCCCACCAGAGGTTCCTCGA 

OGTT TTAGCCA - -CACCCA CCTC - - 

14 10 1450 

I*? J L-HO 1490 1500 1510 1520 

ACATA rTCCCAATCACTAGTTTATTGCGTTAGGAGACTCAGAGATATAGAAAGCAGCTGA 
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CTTGTAT 

1460 

1530 1540 1550 1560 1570 1580 

AATTTAAGGGAGATAAAGCCTGCACTGCACCAAAGCTACGGGTCCCTGTGTTTCCTCTAT ' 



GTTACCT T 

1470 

1590 1600 1610 1620 1630 1640 

TCAGTGATGTCATCAACCTCACTGTCCCAGCCCATGTGTGACTAAAGTGCCCGGTTTTAG 

TC AG CTCTGGCC AAGAG 

1430 

1650 1660 1570 1680 1690 1700 

CCACAGACAACTGCTTAGATGTCACCTCTTGGCTGACCAAAGCTGGGACAGGGCTTTAAC 



_ TGGGACAGGGTTTTAAC 

1490 1500 

1710 1720 1730 1740 1750 1760 

CAGACATAGGAGCAGTGTGCAATTCCTGAT-TCA--CTGCACAGTATTATGTCATAATTG 



CACAAATAGGAGCAGCATGCAATTCCTAGTGACTTGCTGCACAGTATTGTATCATAATTA 
1510 1520 1530 1540 1550 1560 

1770 1780 1790 1800 1810 1320 

CAGGAATTATTTTTTGTTTTTAAAACTGGATTTGGGGCACATTCATTCACCCCAACACTT 

CAGGAA GTTTTTATTTTTAAAACTGGATCTGGGCTATATTCATTTGCCCCATCACCT 

1570 1530 1590 1600 1610 1620 

1330 1340 1950 1860 1870 1930 

CTATCTAAAGGCCAAGGTTCTAGGGCTGCTATGGTCACTAACACACTGATTCTCCTTAAA 



CTGTCTAAAGGCCCAAGTCCTAGGGCTGCCATGGTCACAAGCACACTGATGCTCCTTAAG 
1630 1640 1650 1660 1670 1630 

1390 1900 1910 1920 1930 

CTAATT -CTCGAACTGTGGAACAAAGTG- -ACCCAGACAGCATCCTCACT 



ATTGTTTATCTGGAGCCCACATAGTGTGGAACAAAAAGTCACCTAGAAAGCATCCTTGGT 
L690 1700 1710 1720 1730 1740 

1940 1950 1960 1970 1930 

CATCTTTGTCTCCTTCCCT GGGATGCAGATACCCAAOTTCCTTTTCCAACT 



(;ArCATTu'TCTCCTTCCCACCT<J(X^:(:At;AOATCiCTTA.VATCCAAGTTCTTTCTCCACCT 
l/SO 1760 1770 1730 1790 1300 

19 ,, J 2000 20 10 2020 2030 2040 

TTCGCCTCCCCTAiiGAGATCAOAAAGAATTCTTGTGACTTCCTGGGCAGCCATTGAATTC 



i7rCAv:CTi:CCCCAG<;AGArf:A(}GA TTCCACTGACGTCCTGCGCACCCACTGAATTT 

1U) L320 IHJO 1340 1350 
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2050 2060 2070 2080 2090 2100 

A-TTTTCCATGAGAAGATGACAGAGTTAGCCTGTGGCTATAGGAGATCAT-GTCATCCAG 

AATTTTCCATGAGAA-ACAACAGAGTTAACCTGTGGCATTAGGAGACCTACTTCATGTGG 
1360 1870 1830 1890 1900 1910 

2110 2120 2130 2140 2150 

ACC - TTTTTGCCCATCACATTAACTTTCCTGG AATATTGTGCTGCACAGGTAGACCTGAA 



ACCCTTTTTTTCCTTCAGTTTAACTTTTCTGGAGCAGTGTGCTGCGTAGTTCGGCCTGAG 
1920 1930 1940 1950 1960 1970 

2150 2170 2180 2190 2200 2210 

TCTGCCCAGCTTGTT--GACAGCTCTTGTGTATACTGTGTTGAAGCCAGACAGAAAAGTA 



TTTGTGCAGCTTGTTAAGACAACTCTTGTGTACACTATGTTGAAGCTCAACAAAAAAGTC 
1930 1990 2000 2010 2020 2030 

2220 2230 2240 2250 2260 
ATGGGGCCACTTCT - GAAACCTCTCAGCTGT TGA TCTCACAGCAGCTAAAG 



ATGGGACCACTTCTAGAAATCTTTCAGCTGTCAGGCCTGTCAGTCTCATGACAGTTTGTT 
2040 2050 2060 2070 2080 2090 

2270 2280 2290 2300 2310 2320 

GGTTGTGCCAAACA-TTTTATTAAGAAAGTAAAGCCCAGATTTGAATGGGGGTTTTCCCT 

CWTTGTGCCAAACACTTTATTTGGGA.\AGCAAAGCCCAGATTTGAATGGGTCTTTCCCCT 
2100 2110 2120 2130 2140 2150 

2330 2340 2350 2360 2370 

AGGCCTTATAGTATAGACGCATTTGTAATATGGAGAAAATAATTTTTC TCAT 

GGGCCTTATCCTATAGAGGCATTTGTAATATGGAGAAAATAATTTTTCATTTTTGCTCAT 
2160 2170 2180 2190 2200 2210 

2380 2390 2400 2410 2420 2430 

TTAATTATAGAAATTACCTTCAAACA- -GATTTTGTGTTCTTTGG- -C-CCTTCAAA-TA 



TTAATTCTATAAATTCTCTTTATAAATGAATTTTGTGTTCTTTAGTTCTCCTTAAAAGAA 
2220 2230 2240 2250 2260 2270 

2440 2450 2460 2470 
CTGGTGTTACATTGTTG CTG - CAG AT AAATG ATG ATTGTCGT 



CTTTTGAATTATAAAAATAAAATCTTTACCTGTCGAATTGTTGCTGCAGATGATTGTTGT 
22S0 2290 2300 2310 2320 2330 

2-190 2500 2510 2520 2530 

GGGATATCTGGATCACTGACCTCTGTGCTTTCATTCCTAGAGATGTTTCTCATTCCCATT 

GGAAAATCTCGATCATTGACCTCTGTGCTTTCATTCCTAGAGATGTTTTATACTTACATG 
2340 2350 2360 2370 2380 2390 

2540 2550 2560 2570 2530 2590 

TAGTt.»AAATGCTGTTGCCCCA/\AGT f lATGGTTGTGGCA'rTTCTTACCGCTCATAGCCCCC 
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-AGCAAAA-GCTGTTGCCCCAAAGTGATGGCCCTGGAGG CGG GGC 

2400 2410 2420 2430 2440 

2600 2610 2620 2630 2640 2650 

GGTGAGGAGCAGGGAAGCGCCATTCTGAAAGATTAAACAAAGCACTTCCACTTGAGCTCC 

- - TGAGG AACAGGGAAATGCCGCTGTGAAGTCTTAAA GCACTTCTGCTTAAACTCC 

2450 2460 2470 2430 2490 

2660 2670 2680 2690 2700 

TTATG GAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACCATCCTTCAG 



ATGTGTGAGGAGTGTGCCTCCCTGTGCCCTCTCAGC--TCTGAGGCTGGCCGTCTTTCGG 
2500 2510 2520 2530 2540 2550 

2710 2720 2730 2740 2750 2760 

GGACGTTCCTTTTGGTAAATATACACTGTAATCTTTAAGTCTAAATTTATATGTGAAAGT 

GGT -GTTCCTTTTGGCAAATATACACTGTAATCTT - G AGTCTAAATTT AT ATG TTGAAAT 
2560 2570 2580 2590 2600 2610 

2770 2730 2790 2800 2810 2820 

- - TAACTTTTTT TAAAAACCTAAATAAAATTATTTTCCTATCAAAAAAAAAAAAA^ 

GCTACCTTTTTTAAAATAAGAAACTAAATAAAATTATTTTACTATCAAAAAAAAAAAAAA 
2620 2630 2640 2650 2660 2670 

2830 
AAAGGGCCGCC 

v 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
2680 2690 2700 



(^6 3M (&or6) 
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10 20 30 40 50 

GTCGACCC ACGCGTCCGGCGGGG ACAACTGGGTCTTTTGCGGCTGCAGC - GGGCTTGTAG 

GTCGACCCACGCGTCCGGC CTGCTGA-TCAGTGGCGGCTGCGGCTGAGCTTGCAG 

10 20 30 40 50 

60 70 80 90 100 110 

GTGTCCGGCTTTGCTGGCCCAGCAAGCCTGATAAGCATGAAGCTCTTATCTTTGGTGGCT 



GCATCTAGTCTTGCTGGCTCAGCAAGCCCGATAAGCATGAAGCTGCTGTGTTTGGTGGCT 
60 70 80 90 100 110 

120 130 140 150 160 170 

GTGGTCGGGTGTTTGCTGGTGCCCCCAGCTGAAGCCAACAAGAGTTCTGAAGATATCCGG 



GTGGTGGGGTGCTTGCTGGTGCCCCCAGCTCAAGCCAACAAGAGCTCTGAAGATATCCGG 
120 130 140 150 160 170 

180 190 200 210 220 230 

TGCAAATGCATCTGTCCACCTTATAGAAACATCAGTGGGCACATTTACAACCAGAATGTA 



TGCAAATGCATCTGTCCGCCTTACAGAAACATCAGCGGGCACATTTACAACCAGAATGTG 
180 190 200 210 220 230 

240 250 260 270 280 290 

TCCCAGAAGGACTGCAACTGCCTGCACGTGGTGGAGCCCATGCCAGTGCCTGGCCATGAC 



TCTCAGAAGGACTGCAACTGCCTGCATGTGGTGGAGCCCATGCCAGTGCCTGGCCACGAT 
240 250 260 270 280 290 

300 310 320 330 340 350 

GTGGAGGCCTACTGCCTGCTGTGCGAGTGCAGGTACGAGGAGCGCAGCACCACCACCATC 



GTGGAAGCCTACTGCCTCCTCTCCGAGTGTAGGTACGAGGAGCGTAGCACCACAACCATC 
300 310 320 330 340 350 

360 370 380 390 400 410 

AAGGTCATCATTGTCATCTACCTGTCCGTCGTGGGTCCCCTCTTGCTCTACATGGCCTTC 



A-XCGTCATTATTGTCATCTACCTGTCTGTCGTGGGCGCCCTCTTACTCTACATGGCCTTC 
J60 370 380 390 400 410 

420 430 440 450 460 470 

CTGATGCTGGTGGACCCTCTGATCCCAAAGCCGGATCCATACACTGAGCAACTGCACAAT 



CTGATGCTGGTGCACCCGCTCATCCGGAAGCCAGATGCCTATACTGAGCAGCTGCACAAT 
420 430 440 450 460 470 

4rt0 490 500 510 520 5 30 

iJAGGAi.U^AGAATGACGATCC'nrt^T'nrTATGGCAGCAGCTGCTGirATCCCTCCCGGGACCC 
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GAAGAGGAGAATGAGGATGCTCGCACCATGGCAACAGCCGCTGCGTCCATTGGAGGACCC 
480 490 500 510 520 530 

540 550 560 570 580 590 

CGAGCAAACACAGTCCTGGAGCGTGTGGAAGGTGCCCAGCAGCGGTGGAAGCTGCAGGTG 



CGGGCAAACACTGTCCTGGAGCGGGTGGAAGGCGCTCAGCAGCGGTGGAAGCTGCAGGTG 
540 550 560 570 580 590 

600 610 620 630 640 650 

CAGGAGCAGCGGAAGACAGTCTTCGATCGGCACAAGATGCTCAGCTAGATGGGCTGGTGT 



CAGGAGC AGCGGAAG ACGGTCTTCG ACCG AC ACAAG ATGCTCAGTTAG ATGGT - TGCCAT 
600 610 620 630 640 650 

660 670 680 690 700 710 

GGTTGGGTCAAGGCCCCAACACCATGGCTGCCAGCTTCCAGGCTGGACAAAGCAGGGGGC 

GATTGCATCAGAGACCTGG-GCCATGGCTACCAGCTTCTGGG GCT C 

660 670 630 690 

720 730 740 750 760 770 

TACTTCTCCCTTCCCTCGGTTCCAGTCTTCCCTTTAAAAGCCTGTGGCATTTTTCCTCCT 



-ACTGCAGTCTTCCCT-GG GTCTTCCCTTCAAATGCCCATGGCGTTTATCC T 

700 710 720 730 740 

780 790 800 810 820 830 

TCTCCCTAACTTTAGAAATGTTGTACTTGGCTATTTTGATTAGGGAAGAGGGATGTGGTC 



TCTCCCT- -CTCTAGAAATGT ACTCGACTGTTATAACGAGGG A -GTGTGATTGGGTC 

750 760 770 780 790 800 

840 850 860 870 880 890 

TCTGATCTCCGTTGTCTTCTTGGGTCTTTGGGGTTGAAGGGAGGGGGAAGGCAGGCCAGA 



TCTGTA GGTCT CTGGGGGGT AGAGGGG AGGGG - AGGGAAGGC - AG A 

810 820 830 840 

900 910 920 930 940 950 

AGGGAATGGAGACATTCGAGGCGGCCTCAGGAGTCGATGCGATCTGTCTCTCCTGGCTCC 



AGGGAACAGAGACATTTGAGGTGCCCACATGATTCGGTCGAATTCATCCCTCCTGTCTTC 
850 860 870 880 890 900 

960 970 980 990 1000 1010 

ACTCTTGCCGCCTTCCACCTCTGAG'rCTTGGCAATGTTGTTACCCTTGGAAGATAAAGCT 



AC -CATTCCTC CCAGCTCCACATCTTAAGGATGC - - TT AC CGCAGACCAACCT 

910 920 9J0 910 950 

L020 1030 1040 L050 1060 L070 

GGGTCTTCAGGAACTCAGTGTCTCG(»AkX*AAAGCATCCCCCAGCATTCAGCATCTGTTCC 



GTGTCATCAAGAGC'nrAGTf /(^ITGCGAwGAAAGTATGATCCAGCGCTCAGCC'rTCGCTCT 
960 r/i) 980 090 1000 10 10 
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1080 1090 1100 1110 1120 1130 

TTTCTGCAGTGGTTCTTTATCACCACCTCCCTCCCAGCCCCAGCGCCTCAGCCCCAGCCC 



AGG ATGCTGTGGTCCCCATTC - CC AGTTCCTT - - CAGTGCCAGTACTTTAACTT - GGCC - 
1020 • 1030 1040 1050 1060 1070 

1140 1150 1160 1170 1180 1190 

CAGCTCCAGCCCTGAGGACAGCTCTGATGGGAGAGCTGGGCCCCCTGAGCCCACTGG-GT 

-TACCCCAGTC-TCAGGA ACTGTTG TGGTGCCCCTGAGCCCACAGTCAT 

1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

CTTCAGGGTGCAC - TGGAAGCTGGTGTTCGCTGTCCCCTGTGCACTTCTCGCACTGGGGC 



CTCCAGAGTCC ACCTGGAAGCCTGT - TCCCCTCTCCTCGGCTC -CTGGTC -CACCAGTGC 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 

ATGG - AGTGCCC ATGC ATAC TC TGCTGC - - CGGTCCCCT - - CACC - TGCACTTGA 



ATGGCAGTGCCCATGCATGCCGGCATATTCAGCAGCTGTCACCTTACTCCCATCCCAGGA 
1180 1190 1200 1210 1220 1230 

1310 1320 1330 1340 1350 1360 

; GGGGTCTGGGCAGTCCCTCCTCTCCCCAGTGTCCACAGTCACTGAGCCAGACGGTCGGTT 



GGCCGTAAGGCC - TCCC ACCTCTGCCCTG TG ACTGC AGCTGCTGAGCC ATAA AGTT 

1240 1250 1260 1270 1280 1290 

1370 1380 1390 1400 1410 1420 

GGAACATGAGACTCGAGGCTGAGCGTGGATCTGAACACCACAGCCCCTGTACTTGGGTTG 



GGACCATATGACACAAGGCCAAT-GGGGACCGGAGTACCATGGCTCCTGTCCTTGGATGG 
1300 1310 1320 1330 1340 

1430 1440 1450 1460 1470 1480 

CCTCTTGTCCCTGAACTTCGTTGTACCAGTGCATGGAGAGAAAATTTTGTCCTCTTGTCT 



TCTCTTGTCCCTG AATTTC ATTGTATC A - TGCATGG AG AGAAAAAAAAAAAAAAAAAAAA 
1350 1360 1370 1380 1390 1400 

1490 1500 1510 1520 1530 1540 

TAGAGTTGTGTGTAAATCAAGGA/\GCCATCATTAAf\TTGTTTTATTTCTCAAAAAAAAAA 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
1110 1420 14 30 1440 1450 1460 

1550 1560 
AAAAAAAAAA CCGCCCCCG 



AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGCGGC - - 
M70 14*0 1490 1500 1510 
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10 20 30 40 50 60 

GCACGAGTCCAGACGGAAGTGCGGGCGGAGGATCCCCAGCCGGGTCCCAAGCCTGTGCCT 

G - TCG A CCCA - - CGCGTC C 

10 

70 80 90 100 110 120 

GAGCCTGAGCCTGAGCCTGAGCCTGAGCCCGAGCCGGGAGCCGGTCGCGGGGGCTCCGGG 

GGGC GC -GGGGCTCG GGGC TCGCAGGAGC GG 

20 30 40 

130 140 150 160 170 
CTGTGGGACCGCTGGGCCCCCAGCGATGGCGACCCTGTGG GGAGG CCTTCTTCGGCT 

CT GGCTCCC -GCGATGGCGACCCTATGGTGCGGAAACCTGCTGCGGCT 

50 60 70 80 90 

180 190 200 210 220 230 

TGGCTCCTTGCTCAGCCTGTCGTGCCTGGCGCTTTCCGTGCTGCTGCTGGCGCAGCTGTC 

GGGCTCGGGGCTCAGCATGTCCTGCCTGGCGCTGTCCGTGCTGCTGCTCGCGCAGCTGAC 
100 110 120 130 140 150 

240 250 260 270 280 290 

AGACGCCGCCAAGAATTTCGAGGATGTCAGATGTAAATGTATCTGCCCTCCCTATAAAGA 

AGGCGCCGCCAAGAATTTTGAAGATGTGAGATGTAAATGCATCTGCCCTCCCTATAAAGA 
160 170 180 190 200 210 

300 310 320 330 340 350 

AAATTCTCGGCATATTTATAATAAGAACATATCTCAGAAAGATTGTGATTGCCTTCATGT 

GAATCCTGGGCACATTTATAATAAGAATATATCTCAGAAAGATTGTGATTGCCTTCATGT 
220 230 240 250 260 270 

360 370 380 390 400 410 

TGTGGAGCCCATCCCTCTGCGGGGCCCTGATGTAGAAGCATACTGTCTACGCTGTGAATG 

CGTGGAGCCCATGCCTCTACGGGGACCTGATGTAGAAGCATACTGTCTACCCTCTGAATG 
280 290 300 310 320 330 

420 430 440 450 460 470 

CAAATATGAf\GAAAGAAGCTCTGTCACAATCAAGGTTACCATTATAi\TTTATCTCTCCAT 

CAAATACGAAGAGAGAAGCTCTGTCACA/\TCAAGGTTACCATTATAATTTATCTCTCTAT 
340 350 360 370 380 390 

•180 490 500 510 520 530 

TTTCCGCCT'I'CTACTTCTGTACATGGTATATCTTACTCTCCTTGACCCCATACTCAAGAG 
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TTTGGGCCTTCTGCTTCTGTACATGGTATATCTTACCTTAGTTGAGCCCATCCTGAAGAG 
400 410 420 430 440 450 

540 550 560 570 580 590 

GCGCCTCTTTGGACATGCACAGTTGATACAGAGTGATGATGATATTGGGGATCACCAGCC 

GCGCCTCTTTGGACACTCCCAGCTGTTGCAGAGCGATGATGACGTTGGGGATCACCAGCC 
460 470 480 490 500 510 

600 610 620 630 640 650 

TTTTGCAAATGCACACGATGTGCTAGCCCGCTCCCGCAGTCGAGCCAACGTGCTGAACAA 

TTTTGCAAATGCCCATGATGTGCTGGCCCGCTCTCGCAGCCGAGCCAATGTTCTAAACAA 
520 530 540 550 560 570 

660 670 680 690 700 710 

GGTAGAATATGCACAGCAGCGCTGGAAGCTTCAAGTCCAAGAGCAGCGAAAGTCTGTCTT 

GGTGGAGTACGCTCAGCAGCGCTGGAAGCTCCAGGTCCAGGAGCAGCGAAAGTCTGTCTT 
580 590 600 610 620 630 

720 730 740 750 760 770 

TGACCGGCATGTTGTCCTCAGCTAATTGGGAATTGAATTCAAGGTGACTAGAAAGAAACA 

CGACCGACACGTTGTCCTCAGCTAACTGGGAACTGGAATCA-GGTGACTAGGAAGAA-CA 
640 650 660 670 680 690 

730 790 800 810 820 830 

GGCAGACAACTGGAAAGAACTGACTGGGTTTTGCTGGGTTTCATTTTAATACCTTGTTGA 

CGCAGACAACTGGGAAGAATTGTCTGGGTGT- -CCGTG CCTTTTAATGCCATGTTTG 

700 710 720 730 740 

840 850 860 870 880 

TTT CA — CCAA -CTG - TTGCTGG AAG ATTC AAAACTGG AAGC AAAAAC - TTG CTTG 

TTTTTACAAATCCTTGCTCGATGGAGGAAGACTCCAAACTGGAAGCAAACCCCATGCTTG 
750 760 770 780 790 800 

890 900 910 920 930 940 

ATTTTTTTTTCTTGTTAACGTAATAATAGAGACATTTTTAAAAGCACACAGCTCAAAGTC 

GTATTTT- - - CCTGTTAATATATTAATAG AGACATTTTTACA - GCACACAGTTCCAAGTC 
810 820 830 840 850 860 

950 960 970 980 990 1000 

AGCCAATAAGTCTT r rTCCTATTTGTGACTTTTACTAATAAAAATAAATCTCCCTGTAAAT 

AACCAGTAAGTCTTTTCCTACTTGTGACTTTTACTAATAAAATTAAG-CTGCCTGTGAGT 
370 880 890 900 910 920 

10L0 1020 L0J0 1040 1050 1060 

TA rCTDGA/MrrCCTTTACCTGGAACAAGCACTCTCTTTTTCACCAC-CA -TCCT—CTTT 

TATCTT(;AA(7CCCCGTGCCT(;GAACAAGCTCTCTCTTTCTTCCCACACAGTTCT/\ACTTC 
9 30 940 950 960 970 980 
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1070 1080 1090 1100 1110 

- TCCTCATGG AAATGTC TGC - TTTATGAAACT - ATGCACATATTGAAAGTGAGTTG 

GTGTTCAAGATAACTTCCAGGTGTGTTTTTGCTTCTCTTTCTTGTGGTGGGAGAGAGAAG 
990 1000 1010 1020 1030 1040 

1120 1130 1140 1150 
AAA CAAATGAGGG - TTGGGTAG GAG - CTT — CCAGGC CTGGGA 

GAAGGATGCCITGGGAGTGCTTGAGTAGCTTCTCAAGTGTCTTTTCCAGACAGACTTATG 
1050 1060 1070 1080 1090 1100 

1160 1170 1180 1190 1200 

TTTAC AC CACGCCT A - - GCCCAGCAG AGGCCTTAGTCCCATT - TGG — GGCTT TGGG 

AATACTTCAGACCCTCTACTTCACACTTGTTAATGTCCCAGTGTAGCTGGCTTGTCAGCG 
1110 1120 1130 1140 1150 1160 

1210 1220 1230 1240 
AG TGACATTTGCT - TGA -GGCTT AT AC A CTGGT G 



TGCTGGCCTCCCCACTTGACTTTTGCACTGACTACATTACCTAAGATTCTGGTTAGCCTG 
1170 1180 1190 1200 1210 1220 

1250 1260 1270 
TGGTTGCCTGGCTTG - - C AG GAAATGA CCAAG CTC AC A 



TGGCTGCATTTCATGACCAGTTGGATCTGAAATGCCTGGGGGCTCCTCACAAAATGAAGA 
1230 1240 1250 1260 1270 1280 

1280 1290 1300 1310 1320 
CATGC TGCCTC AAGCCT - AACMR - KACAACTGAGGTACTCTTTTGA 



TTTGTTTC ATGCACTGTG ATGTCTG ACGC AAC ATGTTCT AG AAC AG ACTGGC - C ATCTGC 
1290 1300 1310 1320 1330 1340 

1330 1340 1350 1360 
AGGATGAAGGTGGTG - - G ATTCTC AGCC - CTGGG GGTCTTCCTCA - C 



TAGTTTACACTGATACCTAAACACAGTCTCAGTGTGTGTGGTCTTCCTCATCTTCTTCTA 
1350 1360 1370 1380 1390 1400 

1370 1380 
CTGAGGAC C CTT CAGAGCCACCC 



GTAGCTCTAAGGACTTCAACATTTAGAATAAACACATTTTCTCTTAAGCCCAAGCCTCCC 
UIO U20 Li JO 1440 1450 1460 

1390 1400 1410 1420 1430 
TTTCTACTT TGCATTTCCTGGTCCACACATTTAAGGCATA ACACCACAT 



TGGATGATTGACGTACA/V,\TACT(;AT-CAGCCTTTTCTGTCTTGCTGAGAGGCAGTTCTT 
U70 1480 14 90 1500 1510 

L440 1450 H60 1470 U80 
TCATCCCTT - TOGTTTG - - -GGATCT C AGGAATACAGT CC -CATGCAAAGAT- 
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TGAACTGATGTGGGCAGCTTTGAACAAGGACTAGAGTTCAGATTGCCTCTCTCTGAGAAG 
1520 1530 1540 1550 1560 1570 

1490 1500 1510 1520 1530 1540 

TCTCTGGTTTTATGGCTTTTTTCCCTTTCT-TTACACCATCCTCTCCCATAAGCACCCAT 
: : : . . : . . : : : : : : : : : . :::::::: . . : . : : : : : 

TCTAACAGTTATTGGATAACTGGCTTTTTTCTTCCTACATCCTCTTTGGAATGTAACAAT 
1580 1590 1600 1610 1620 1630 

1550 1560 1570 
GTCTTTGAATATG AATGTATTTGTAAAATAAAAAA 



AAAATAATTTACAAAACCCAAAAAAAAAAAAAAGGGCGGCCG 
1640 1650 1660 1670 1680 
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10 20 30 40 50 
h- o Kti k> GTCGACCCACGCGTCCG CTCTGAGTCACCGGAATCTAGGTGGGGC CGCC - CG 

MOfciNe G7CGACCCACGCGTCCGGCGCTCTGAGTCACCGGAATCAAGGTGTGGCTGGAGCGCCGCT 
10 20 30 40 50 60 

60 70 80 90 100 

GAGCGGCGTCCT CGGGAGCCGCCTCCCCG- CGGCCTCTTCGCTTTTGTGGCG 

CCCCCGCCGCCAGCCCGGGGGCCGCGTCTTCGGGGGAGCCGCCTCTTC - CTTTAGTCGCG 
70 80 90 100 110 

110 120 130 140 ISO 160 

GCGCCCGCGCTCGCAGG - CCACTCTCTGCTGTCGC - CCGTCCCGCGCGCTCCTCCGACCC 

GTGTCAGCGCTCGCAGGACCACTCTTGGCCGCTGCTCCTGCCCG -GCGTTCCTCCG 

120 130 140 150 160 170 

170 180 190 200 210 220 

GCTCCGCTCCGCTCCGCTCGGCCCCGCGCCGCCCCTCAACATGATCCGCTGCGGCCTGGC 

- CTCCGCGC CCGC CGCCACC - GACGACATGCTGCGCTGCGGCCTGGC 

180 190 200 210 

230 240 250 260 270 280 

CTGCGAGCGCTGCCGCTGGATCCTGCCCCTGCTCCTACTCAGCGCCATCGCCTTCGACAT 

CTGCGAGCCCTGCAGCTCGATCCTCCCCCTCCTCCTGCTCAGCCCCATCGCCTTCGACAT 
220 230 240 250 260 270 

290 300 310 320 330 340 

CATCGCGCTGGCCGGCCGCGGCTGGTTCCAGTCTAGCGACCACGGCCAGACCTCCTCGCT 

CATCCCGCTGGCCGGCCGCGGCTGGCTGCAGTCTAGCAACCACATCCAGACATCGTCGCT 
280 290 300 310 320 330 

350 360 370 380 390 400 

CTCGTGCAAATCCTCCCAACACCGCCGCGGCACCCGGTCCTACCACCACGGCTGTCAGAC 

TTCGTGGAGGTGTTTCGACCAGGGCGGCGGCAGCGGCTCCTACGACGATGGCTGCCAGAG 
340 350 360 370 380 3 90 

410 420 430 440 450 460 

CCTCATGGAGTACGCGTGGCGTAGACCAGCCGCTCCCATGCTCTTCTCTCCCTTCATCAT 

CCTCATCGACTACCCATCGCGACGAGCACCTGCAGCCACCCTTTTCTGTCGCTTTATCAT 
400 A 10 420 4 30 440 450 
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470 480 490 500 9l -520 

CCTCGTCATCTGT'l 1 CATCCTCTCCITCTTCGCCL Id 0 1GGACCCCAGATGC rTCTCTT 

: : : : : : : : : : : : : : : : : : : ::::::::::: x :::::: i : : : 

CCTGTG CATCTC CTTCATTCTCTCGTTCTT CG C C CTCTGTGG AC CCCAG ATGCTTGTTTT 
460 470 400 490 500 510 

530 540 550 560 570 580 

CCTGAGAGTGATTGGAGGTCTCCTTGCCTTGGCTGCTGTGTTCCAGATCATCTCCCTGGT 

::::::::: :::::::: : : : : : : : ::::::: .:.:::::::::::::::::::: 
CCTGAGAGTCATTGGAGGCCTCCTCGCACTGGCTGCCATATTCCAGATCATCTCCCTGGT 
520 530 540 550 560 570 

590 . 600 610 620 630 640 

AATTTAC CCCGTGAAGTACAC CCAG AC CTTCACCCTTCATGCCAACCCTGCTGTCACTTA 

AATCTACCCCGTGAAGTACACACAGACCTTCAGGCTTCACGATAACCCTGCTGTTAATTA 
530 590 600 610 620 630 

650 660 670 680 690 700 

CATCTATAACTGGGCCTACGGCTTTGGGTGGGCAGCCACGATTATCCTGATTGGCTGTGC 

CATCTATAACTGGGCCTATGGCTTCGG ATGGGCGG CCACCATCATCTTGATTGGTTGTTC 
640 650 660 670 680 690 

710 720 730 740 750 760 



CTTCT'l C T TCTGCTGCCTCCCCAACTACGAGGATG AC C ' rri ' X 'GGGGGCCGCCAAGCCCAG 
700 710 720 730 740 750 

770 730 790 800 810 820 

GTACITCTACACATCTGCCTAACTTGGGAATGAATGTGGGAGAAAATCGCTGCTGCTGAG 
: . : . . : : : : 

GTACTTCTATCCCCCAGCCTAATGTGGGAGGAAGAGCCTGAGAAAAGC - CTGCTGCA - AG 
760 770 780 790 800 810 

830 840 850 860 870 680 

ATGGACTCCAGAAGAAGAAACTGTTTCTCCAGGCGACTTTGAACCCATTTTTTCGCAGTG 

ATGGAT - - CTCAGGACGAAACTGTT - CTCCAAGGCACAAGGAACCTACCTTTGGGCAATG 
820 630 840 850 860 870 

890 900 910 920 930 

TTCATATTATTAAACTAGTCAAAAATG CTAAAATAATTT - GGGAG AAAATATT TTTTAAG 



TTCATATGAT C AG AAATGCTAG AATAAATG CTAAAGAAAATTCTTCATAAT 

880 890 900 910 920 

940 950 960 970 980 990 

TAGTXnTATAGTTTCATCTTTAT CTTTT ATTATGTTTTGTGAAGTTGTGT CT 1 1 l 'CACTA 

TACTGTTA - AGTTTCATCTATCTCCT - - GTGCAGTTAAAAAGACTTGAAT TCTC 

930 940 950 960 970 

1000 1010 1020 1030 1040 1050 

ATTACCTAT ACTATCCCAATATTTCCTTATATCTATCC - AT AACATTTATACTACATTTG 

TTTGCTAAGTATATGCTAA 1T TTTCCTTATGTCAATTCTATACCATTTAAGCTTCATTTG 
980 990 1000 1010 1020 1030 

1060 1070 1080 1090 1100 1110 
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TAAGAGAATATGCACGTGAAACTTAACACTTTATAAGGTA^AAATGAGGTTTCCAAG -AT 

TTAAAG AATATo vJ CTGTGAAACTTGA TAAGGTAGAAATCTACt^GCCTCTCAT 

1040 X050 1060 1070 1080 

1120 1130 1140 1150 1160 1170 

TXAATAATCTGATCAAGTTCTTGTTATTTC CAAATAGAATGG ACTCGGTCTGTTAAGGGC • 

TTAATAATCTGATGGCGCTTCTGTT - TTTCCACATAGAATGGGTTGTTTCTGCTAAGGGC 
1090 1100 1110 1120 1130 1140 

1180 1190 1200 1210 1220 1230 

input 3 TAAGGAGAAGAGGAAGATAAGGTTAAAAGTTGTTAATGACCAAACATTCTAAAA- - -GAA 



: : ; 



TACAGAGGAG - GAAAGTCACTGGCAAAACT - - TCCGTGACCAAATATCCTGAAATTAGTA 
1150 1160 1170 1180 1190 1200 

1240 1250 1260 1270 1280 1290 

ATGCAAAAAAAAAGTTTATTTTCAAGCCTTCGA- - ACTATTTAAGG- -AAAGCAAAATCA 

TTTn " I T A AAAAGACCTTATTTTGAGTTTTCAGTTACATAAAAAAGCAGAAGCAGATTGG 
1210 1220 1230 1240 1250 1260 

1300 1310 1320 1330 1340 

TTTC CT AAATG CAT AT C ATTTG TG AG AATTT CT CATTAAT ATCCTG AATCATTCAT - TTT 

TTTCCTAAGTGAGCATCGTTTGTGAGAATTTTTAGTCAGTGTTTTG 
1270 1280 1290 1300 1310 1320 

1350 1360 1370 1380 1390 1400 

AGCTAAGGCTTCATCTTGACTCGATATGTCATCTAGGAAAGTACTATTTCATGGTTCAAA 

TTCTAAG -CTTCGTCTTCACTTTCTCTGATGCGTAGAAAAGT GTTCTAA 

1330 1340 13S0 1360 1370 

1410 1420 1430 1440 1450 1460 

CCTGTTGCCATAGTTGGTAAGGCTTTCCTTTAAGTG 

C- -GTAGCCAAGCTTAA -GCCGCTGTCACTAC TGAAATGCTAA- - - G AATTTTCCT 

1380 1390 1400 1410 1420 

1470 1480 1490 1500 1S10 1520 

CTTTTAAAGTTCTTTATACGGTTAGGCTGTGGGAAAATGCTATATTAATAAATCTGTAGT 

CTTTTCCCCTAGTCTAGAGGCGTAGCGTGTGGGAAGAACCCGTCTTACCACATCTGTAGT 
1430 1440 1450 1460 1470 1480 

1S30 1540 15S0 1560 1570 1580 

GTTTTGTGTTTATATGTTCAGAACCACACTAGACTCCATTCAAAGATGGACTGGGTCTAA 

ATTCTCTC - -TCTATCCTTACAACCAGCGTAGACCCCATCGGAGGATGCACTAGGCCTAA 
1490 1500 1510 1S20 1530 1540 

1590 1600 1610 1620 1630 1640 

TTTATCATGACTGATACATCTCGTTAAGTTCTCTAGTAAAGCATTAGCAGGGTCATTCTT 

TCCCTCCCAACTCGTCGATGTGAAGAGCTCACGTACGAAGGCAC - AGCACCCTCACCACT 
1550 1560 1570 1580 1590 1600 

1650 1660 1670 1680 1690 1700 

CTCACAAAAGTCCCACTAAAACACCCTCACC AC AATAAATCAC TTCCTT 7TCTAAA 
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GTCACAGCAGTGCCATGCACACATCCT - AGGAGAAGACATGGCAGTGTTTCTTCTCAGTG 
16X0 1620 1630 1640 &fc 

1710 1720 1730 1740 17S0 1760 

TCTCAGGT - TTATCTGGGCTCTATCATATAGACAGGCTTCTGATAGTTTGCAACTGTAAG 

: : . : . : :::::: : : . :::::::: 

CTTCTT CCCTTAACTGAGCTCTG - CTCACAGACAG - CTA - GAATAGATTTTAACTGTAA - 
1660 1670 1680 1690 1700 1710 

1770 1780 1790 1800 1810 1820 

CAGAAACCTACATATAGTTAAAATCCrGGT Cl M lTCrr GGTAAACAGATTTTAAATGTCTG 

CAGAAACCTAAATGTAATTAAAA - C CTGGTCTTCCTTGGTAAGCAGACTTAAAATATCTG 
1720 1730 1740 1750 1760 1770 

1830 1840 1850 1860 1870 1880 

ATATAAAACATGCCACAGGAGAATTCGGGGATTTGAGTTTCTCTGAATAGCATATATATG 

- TATAGTACATG CAAGTGG AAAATTTGCGAAT - - GCGTCTCTCTGAATA - CATACCGGAA 
1780 1790 1800 1810 1820 1830 

1890 1900 1910 1920 1930 1940 

ATGCATCGGATAGGTCATTATGA l ' Tl ' l T TA CCATTTCGACTTACATAATGAAAACCAATT 

GGGCTACTATTA CCTT TTCCTTACCATTTATACTTACCTAATGGAAACGAGCT 

1840 18S0 1860 1870 1880 

1950 1960 1970 1980 1990 2000 

CATTTTAAATATCAGATTATTATTTTGTAAGTTCTGGAAAAAGCTAATTGTAGTTTTCA^ 
: ::::::::::: : : : : : 

TGTTTTAACTATCAGAACACTATTTTGTAAGCTGCTGCAAAGAC - AGTTCAAGTTTTCAT 
1890 1900 1910 1920 1930 1940 

2010 2020 2030 2040 2050 
TATGAAGTTTTCCCAATAAACCAGGTATTCTAAAAAAAAAAAAAAA 

TAC - CAACTTCCCCAATAAACCAGGTGTTCAAAAAAAAAAAAAAAACAAAAAAAAAAAAA 
1950 1960 1970 1980 1990 2000 

2060 

AAAAACGCCCCCCCC 



AAAAAAAAAAAAAAAAAAAGGCCCGCCCC 
2010 2020 2030 
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10 

W*M 0 GTCGACCC ACGCGTCCG 



M'^Wc GTCGACCCACGCGTCCGCGGACGCGTGGGCACTCGGCCACTCTGCGGAGCAGGCATGGGA 
10 20 30 40 50 60 

20 30 40 50 60 70 

GCCGCGCGCTCTCTCCCGGCGCCCACACCTGTCTGAGCGGCGCAGCGAGCCGCGGCCCGG 



GCCGCGCGCGTCCTCCGGGCCCCCACACCTGTCTGAGCGGCGCA-CG-GCCGCGGCCCCG 
70 80 90 100 110 

80 90 100 110 120 130 

GCGGGCTGCTCGGCGCGGAACAGTGCTCGGCATGGCAGGGATTCCAGGGCTCCTCTTCCT 



GCGGGCTCCTCCACGCGGTA--GCACTCAGCATCCCTGGAATCCCGGGGCTCTTCATCCT 
120 130 140 150 160 170 

140 150 160 170 180 190 

TCTCTTCTTTCTGCTCTGTGCTGTTGGGCAAGTGAGCCCTTACAGTGCCCCCTGGAAACC 



TCTTGTC CTGCTCTGTGTGTTCATGCAGGTGAGTCCCTACACCGTTCCGTGGAAACC 

180 190 200 210 220 230 

200 210 220 230 240 250 

CACTTGGCCTGCATACCGCCTCCCTGTCGTCTTGCCCCAGTCTACCCTCAATTTAGCCAA 



CACATGGCCGGCTTATCGCCTCCCTGTAGTCTTGCCTCAGTCTACCCTCAACTTAGCTAA 
240 250 260 270 280 290 

260 270 280 290 300 310 

GCCAGACTTTGGAGCCGAAGCC.\AATTAGAAGTATCTTCTTCATGTGGACCCCAGTGTCA 



GGCAGACTTCGACGCCAAAGCGAAi\TTGGAGGTGTCCTCCTCATGTGGACCTCAGTGTCA 
300 310 320 330 340 350 

320 330 340 350 360 370 

TAAGGGAACTCCACTGCCCACTTACGAAGAGGCCAAGCAATATCTGTCTTATGAAACGCT 



C^GCC^CACCACTGCCCACCTACGAAGACCCC/VXGCAGTACCTTTCCTATGAAACCCT 
360 370 380 390 400 410 

J SO 3 90 400 410 4 20 4 JO 

CTATGCCAATCGCACCCGCACAGAGACGCAGCTGGGCATCTACATCCTCACCAGTAGTGG 



TTATGCCAATGCCACCCGCACAGAGACTCCGGTGGGCATCTACATCCTCAGCAATGGTGA 
420 4)0 440 450 460 470 

•110 4 "50 4«i0 470 4:40 4''0 

Av;ATGCGGCCCAACAC^XlAi;AC'rt:A(li:GTC , rT(;AGGAAAGTCTCGAAGGAAGC(7GCAGAT 
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AGGCAGGGCACGAGGCAGAGACTCGGAGGCCACAGGGAGATCTCGCAGGAAGAGGCAGAT 
480 490 500 510 520 530 

500 510 520 530 540 550 

TTATGGCTATGACAGCAGGTTCAGCATTTTTGGGAAGGACTTCCTGCTCAACTACCCTTT 

TTATGGCTACGATGGCAqGTTTAGCATTTTTGGGAAGGACTTCCTGCTCAATTATCCTTT 
540 550 560 570 580 590 

560 570 580 590 600 610 

CTCAACATCAGTGAAGTTATCCACGGGCTGCACCGGCACCCTGGTGGCAGAGAAGCATGT 

CTCAACATCGGTGAAGTTGTCTACTGCCTGCACTGGCACCCTGGTGGCAGAGAAGCACGT 
600 610 620 630 640 650 

620 630 640 650 660 670 

CCTCACAGCTGCCCACTGCATACACGATGGAAAAACCTATGTGAAAGGAACCCAGAAGCT 

CCTCACTGCTGCCCACTGCATACACGATGGGAAAACCTATGTGAAAGGGACACAGAAACT 
660 670 680 690 700 710 

680 690 700 710 720 730 

TCGAGTGGGCTTCCTAAAGCCCAAGTTTAAAGATGGTGGTCGAGGGGCCAACGACTCCAC 

CCGAGTGGGCTTCCTGAAGCCCAAGTATAAAGATGGTGCCGAAGGGGACAACAGCTCGAG 
720 730 740 750 760 770 

740 750 760 770 780 790 

TTCAGCCATGCCCGAGCAGATGAAATTTCAGTGGATCCGGGTGAAACGCACCCATGTGCC 

CTCAGCCATGCCAGACAAGATGAAGTTTCAGTGGATCCGCGTGAAACGCACCCATGTGCC 
780 790 800 810 820 830 

800 810 820 830 840 850 

CAAGGGTTGGATCAAGGGCAATGCCAATGACATCGGCATGGATTATGATTATGCCCTCCT 

CAAGGGGTGGATCAAGGGCAATGCCAATGACATCGGCATGGATTATGACTACGCCCTGCT 
840 850 860 870 880 890 

860 870 880 890 900 910 

GGAACTCAAAAAGCCCCACAAGAGAAAATTTATGAAGATTGGGGTGAGCCCTCCTGCTAA 

GGAACTCAAGAAACCCCACAAAAGACAGTTCATGAAGATTGGTGTGAGTCCTCCAGCGAA 
900 910 920 930 940 950 

9:0 930 940 950 960 970 

GCACCTCCCAGCGGGCAGAATTCACTTCTCTCGTTATCACAy\TCACCGACCACGCAATTT . 

GCAGCTCCCAGGGG(;CAGGATCCACTTCTCTGGTTATGACAATGACCGCCCCGCCA/\TTT 
960 970 980 990 1000 1010 

980 990 L000 1010 1020 1030 

GGTGTATCGCTTCTCn^ACGrCAAAGACGAGACCTATGACTTGCTCTACCAGCAATGCCA 

GGTGTAt.'CGCTTCn 5T< 7 AT< JTCA/XAGATGAGAv.'CTACGACCTTCTCTACCAGCAGTGTGA 
1020 10)0 L040 1050 1060 1070 
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1040 1050 1060 1070 1080 1090 

5 TGCCCAGCCAGGGGCCAGCGGGTCTGGGGTCTATGTGAGGATGTGGAAGAGACAGCAGCA 

CGCCCAGCCCGGGGCCAGTGGTTCAGGGGTCTATGTGAGGATGTGGAAGAGACCACAGCA 
1080 1090 1100 1110 1120 1130 

1100 1110 1120 1130 1140 1150 

; GAAGTGGGAGCGAAAAATTATTGGCATTTTTTCAGGGCACCAGTGGGTGGACATGAATGG 

GAAATGGGAAAGAAAAATTATCGGCATCTTTTCAGGGCACCAGTGGGTGGACATGAATGG 
1140 1150 1160 1170 1180 1190 

1160 1170 1180 1190 1200 1210 

: TTCCCCACAGGATTTCAACGTGGCTGTCAGAATCACTCCTCTCAAATATGCCCAGATTTG 



CTCTCCACAGGATTTCAACGTGGCAGTTAGAATCACGCCTCTTAAATATGCCCAGATTTG 
1200 1210 1220 1230 1240 1250 

1220 1230 1240 1250 1260 1270 

CTATTGGATTAAAGGAAACTACCTGGATTGTAGGGAGGGGTGACACAGTGTTCCCTCCTG 

CTATTGGATTAAAGGAAACTACCTAGATTGCAGGGAGGGGTGACA-TGCGT--CTTCTTG 
1260 1270 1280 1290 1300 1310 

1280 1290 1300 1310 1320 1330 

GCAGCAATTAAGGGTCTTCATGTTCTTATTTTAGGAGAGGCCAAATTGTTTTTTGTCATT 



CCAGCACCAATGG - TCTTTTTGCACTC ATTGTAGGAG AGGC TAGCTTTTTATCATT 

1320 1330 1340 1350 1360 

1340 1350 1360 1370 1380 1390 

GGCGTGCACACGTGTGTGTGTGTGTGTGTGTGTAAGGTGTCTTATAATCTTTTACCTA - - 

G ACTCTTGTG GTGTGAGTCA CATAGTATCTTTTACCTAGT 

1370 1390 1390 1400 

1400 1410 1420 1430 1440 1450 

TTTCTTACAATTGCAAG A - TGACTGGCTTTACTATTTGAAAACTGGTTTGTGTATCATAT 

ATTCTTCAAATGGCAAAAATTATTGCCTATATTATTTTAAAACTGTTGTGTG - --CGT- - 
1410 1420 1430 1440 1450 1460 

1460 1470 1480 1490 1S00 1510 

CATATATCATTTAAGCAGTTTGAAGGCATACTTTTGCATAGAAATAAAAAAAATACTGAT 



- -TATAGCATTTAAGCAGTCTGA^XAGCATACTTTTGCATAGAGACTrTAAA GTA 

1470 1480 1490 1500 1510 

1520 1530 1540 1550 1560 1570 

TTGGGGCAATGAGGAATATTTGACAATTAAGTTAATCTTCACGTTTTTGCAAACTT-TGA 



TTCGGGT.-\ATACGCCCTATTTGACA^\GGAAGTTA-^ACTTTCAGTTTTTGGAGAATTCTAA 
1520 1530 1540 1550 1560 1570 

L5H0 1590 LiiOO L610 1620 16)0 

TrTTTATTTCATCTGAAC'rTGTTTCAAAiJATTTATATTAAATATTTGGCATACAAwAGAT 
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TTTTTGTCTGATCCAAACTTGCTTCAGAGGTTTATATCAAATACGTGACACACAGGGAAT 
1580 1590 1600 1610 1620 1630 

1640 1650 1660 1670 1680 1690 

ATGAATTCTTATATGTGTGCATGTGT--GTTTTCTTCTGAGATTCATCTTGGTGGTGGGT 



ATGAATTCTTATGTTTGTATATGTATATGTTTTCTTCTGAGAGTCAT 

1640 1650 1660 1670 

1700 1710 1720 1730 1740 1750 

TTTTTTGTTTTTTTAATTCAGTGCCTGATCTTTAATGCTTCCATAAGGCAGTGTTCCCAT 



- ATATTG ATATTTTTGTAATGTG - - TGGT - TATTATGCTTCC A 

1680 1690 1700 1710 

1760 1770 1780 1790 1800 1310 

TTAGGAACTTTGACAGCATTTGTTAGGCAGAATATTTTGGATTTGGAGGCATTTGCATGG 



GATAATGATAGCA 

1720 1730 

1820 1830 1840 1850 1860 1870 

TAGTCTTTGAACAGTAAAATGATGTGTTGACTATACTGATACACATATTAAACTATACCT 



1880 1890 1900 1910 1920 1930 

TATAGTAAACCAGTATCCCAAGCTGCTTTTAGTTCCAAAAATAGTTTCTTTTCCAAAGGT 



1940 1950 1960 1970 1980 1990 

TGTTGCTCTACTTTGTAGGAAGTCTTTGCATATGGCCCTCCCAACTTTAAAGTCATACCA 



AAGTCTT - -CAATACCC 

1740 

2000 2010 2020 2030 2040 2050 

GAGTGGCCAAGAGTGTTTATCCCAACCCTTCCATTTAi\CACGATTTCACTCACATTTCTG 



2060 2070 20H0 2090 2100 2110 

GAACTAGCTAriTrTCAGAAGACAATAATCAGG(3CTrAATTAGAACACGCTGTATTTCCT 



2120 2L 30 2 140 2 ISO 2 160 2 170 

(XCAGCAv\AC;A(»'r't'(ITC7GCCA(rAC'rAAAAACAA'rCATAGCATTTTACCCCTCGATTATAG 
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2180 2190 2200 2210 2220 2230 

CACATCTCATGTTTTATCATTTGGATGGAGTAATTTAAAATGAATTAAATTCCAGAGAAC 

AATTTATAATGTTTTGGATTC 

1750 1760 

2240 2250 2260 2270 2280 2290 

AATGGAAGCATTGCCTGGCAGATGTCACAACAGAATAACCACTTGTTTGGAGCCTGGCAC 

AAACATT 

1770 

2300 2310 2320 2330 2340 2350 

AGTCCTCCAGCCTGATCAAAAATTATTCTGCATAGTTTTCAGTGTGCTTTCTGGGAGCTA 



TACGTAGTAGTC 

1780 

2360 2370 2380 2390 2400 2410 

TGTACTTCTTCAATTTGGAAACTTTTCTCTCTCATTTATAGTGAAAATACTTGGAAGTTA 



CTTGAAGAGAA 

1790 

2420 2430 2440 2450 2460 2470 

CTTTAAGAAAACCAGTGTGGCCTTTTTCCCTCTAGCTTTAAAAGGGCCGCTTTTGCTGGA 



CAATAA 

1800 

2480 2490 2500 2510 2520 2530 

ATGCTCTAGGTTATAGATAAACAATTAGGTATAATAGCAAAAATGAAAATTGGAAGAATG 



— - TTT ATTGGCT ATATTG AT A - - 

1810 1820 

2540 2550 2560 2570 2580 2590 

CAAAATGGATCAGAATCATGCCTTCCAATAAAGGCCTTTACACATGTTTTATCAATATGA 



2600 2610 2620 2630 2640 2650 

TTATCAAATCACAGCATATACAGAAAAGACTTGGACTTATTGTATGTTTTTATTTTATGG 



2660 2670 2680 2690 2700 2710 

CTCTCGGCCTAAGCACTTCTTTCTAAATGTATCGGAGAAAAAATCAAATGGACTACAAGC 



2720 2730 2740 2750 2760 2770 

ACCTC7rTTGCTGTCCTTCCACCCCACC'rAAACCTGCATTGTAGCA,\TTTGTAACCATATT 
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CCCA TATAAG 

1830 

2780 2790 2800 2810 2820 2830 

CAGATGGAGCACTGTCACTTAGACATTCTCTGGGGGATTTTCTGCTTGTCTTTCTTGAGC 



ACTGTATCTTA 

1840 

2840 2850 2860 2870 2880 2890 

TTTTTGGAAGGATAATTCTGATAAGGCACTCAAGAAACGTACAACCACAGTGCTTTCTTC 



CAGTGCA - 

1850 

2900 2910 2920 2930 2940 2950 

AAATCATATGAGAAATACTATGCATAGCAAGGAGATGCAGAGCCGCCAGGAAAATTCTGA 

CAGA 



2960 2970 2980 2990 3000 3010 

GTTCCAGCACAATTTTCTTTGGAATCTAACAGGAATCTAGCCTGAGGAAGAAGGGAGGTC 

ATTCC- -CAC GC 

1860 



3020 3030 3040 3050 3060 3070 




TGCTTT 
1870 



3080 . 3090 3100 3110 3120 3130 

AAGTTCACTGAACACCAAGACCAGAATGGATTTTTTTAAAAAAATAGATGTTCCTTTTGT 



3140 3150 3160 3170 3180 3190 

GAAGCACCTTGATTCCTTGATTTTGATTTTTTGCAAAGTTAGACAATGGCACAyVAGTCAA 



-TAGTTTTGA- 



3200 3210 3220 3230 3240 3250 

AATCAAATCAA'rG'rTTAGT'rCACAAGTAGATCTAATTTACTAAAGAATGATACACCCATA 



3260 3270 3280 3290 3300 3)10 

TCCTATATACAGCT1V\ACTCACAGAACTGTAAAAGAAi'\ATTATAAAATAATTCAAirAT(jT 



AAATAAAA'." 

LHrtO 
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3320 3330 3340 3350 3360 3370 

CCATCTTTTTAGTGATAATAAAAGAAAGCATGGTATTAAACTATCATAGAAGTAGACAGA 



3380 3390 3400 3410 3420 3430 

AAAAGAAAAAAGGACTCATGGCATTATTAATATAATTAGTGCTTTACATGTGTTAGTTAT 



3440 3450 3460 3470 3480 3490 

ACATATTAGAAGCATATTTGCCTAGTAAGGCTAGTAGAACCACATTTCCCAAAGTGTGCT 



TTTCCC 

1890 

3500 3510 3520 3530 3540 3550 

CCTTAAACACTCATGCCTTATGATTTTCTACCAAAAGTAAAAAGGGTTGTATTAAGTCAG 



TTGTAAAAAA 

1900 

3560 3570 3580 3590 3600 3610 

AGGAAGATGCCTCTCCATTTTCCCTCTCTTTATCAGAGGTTCACATGCCTGTCTGCACAT 



3620 3630 3640 3650 3660 3670 

TAAAAGCTCTGGGAAGACCTGTTGTAAAGGGACAAGTTGAGGTTGTAAAATCTGCATTTA 



3680 3690 3700 3710 

AATAAACATCTTTGATCACAAAAAAAAAAAAAAAGGGCGGCCG 

AAAAAAAAAAAAAAAGGGCGGCCG 

1910 1920 
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10 20 30 40 
GTCGACCCACGCGTCCGGGCTCATGGCGCCGGC GTCGCGGT TGCTC-- 

GTCGACCCACGCGTCCGGT-TCATGGCGGCGGCTGGGCGGCGCGGTCTGCTTTTGCTCTT 
10 20 30 40 50 

50 60 70 80 90 100 

-GCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG-GGCGGAGGGCGACGGCG 

TGTACTATGGATGATGGTGACTGTGATTCTGCCTGCCTCTGGCGAAGGGGGATGGAAACA 
50 70 80 90 100 110 

110 120 130 140 150 160 

GGTGGCGCCCGGGCGGCCCGGGGGCCGTGGCGGAGGAGGAGCGCTGCACGGTGGAGCGTC 

GAATGGGCT-GGGAATTGCAGCAGCAGTAATGGAGGAGGAGCGTTGCACAGTGGAGCGTC 
120 130 140 150 160 170 

170 180 190 200 210 220 

s GGCCCGACCTCACCTACGCGGAGTTCGTGCAGCAGTACGCCTTCGTCAGGCCCGTCATCC 

GGGCACACATCACGTACTCCGAATTCATGCAGCACTATGCCTTCCTCAAGCCCGTCATCT 
180 190 200 210 220 230 

230 240 250 260 270 280 

3 TGCAGGGACTCACGGACAACTCGAGGTTCCGGGCCCTGTGCTCCCGCGACAGGTTGCTGG 



TGCAAGGACTCACGGACAACTCGAAGTTCCGGGCCCTGTGTTCCCGGGAAAACCTGCTAG 
240 250 260 270 280 290 

290 300 310 320 330 340 

i CTTCGTTTGGGGACAGAGTGGTCCGGCTGAGCACCGCCAACACCTACTCCTACCACAAAC 

CCTCGTTCGGGGACAACATTGTTCGCTTGAGTACAGCCAACACCTACTCCTACCAGAAAG 
300 310 320 330 340 350 

350 360 370 380 390 400 

TGGACTTGCCCTTCCAGGAGTATGTGGAGCAGCTGCTGCACCCCCAGGACCCCACCTCCC 

TGGACCTGCCCTTCCAGGAATATGTGCAACACCTGCTGCAGCCCCAGGATCCTGCATCCC 
360 370 380 390 400 410 

410 420 430 440 450 460 

TGGGCAATGACACCCTCTACTTCTTCGCGGACAACAACTTCACCGACTGCGCCTCTCTCT 



TAGGCAATCACACCCTGTACTTTTTTCGAGACAACAACTTCACTGACTCGGCATCCCTCT 
120 430 440 450 460 470 

470 480 490 500 510 520 

TTCGCCACTACTCCCCACCCCCATTTGCCCTGCTGGGAACCCCTCCACCTTACAGCTTTG 
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TCCAGCACTACTCTCCGCCACCATTCCGTCTCCTGGGAACCACCCCTGCTTACAGCTTTG 
480 490 500 510 520 530 

530 540 550 560 570 580 

GAATCGCAGGAGCTGGCTCGGGGGTGCCCTTCCACTGGCATGGACCCGGGTACTCAGAAG 

GAATTGCAGGAGCTGGATCTGGGGTACCCTTCCACTGGCATGGGCCTGGTTTCTCAGAGG 
540 550 560 570 580 590 

590 600 610 620 630 640 

TGATCTACGGTCGTAAGCGCTGGTTCCTTTACCCACCTGAGAAGACGCCAGAGTTCCACC 

TTATCTATGGTCGGAAGCGCTGGTTCCTCTACCCTCCTGAGAAGACACCTGAGTTCCACC 
600 610 620 630 640 650 

650 660 670 680 690 700 

CCAACAAGACCACGCTGGCCTGGCTCCGGGACACATACCCAGCCCTGCCACCGTCTGCAC 

CTAACAAGACCACATTGGCCTGGCTGCTGGAAATATACCCATCTCTAGCCCTGTCAGCAC 
660 670 680 690 700 710 

710 720 730 740 750 760 

GGCCCCTGGAGTGTACCATCCGGGCTGGTGAGGTGCTGTACTTCCCCGACCGCTGGTGGC 

GGCCTCTAGAATGTACCATCCAGGCTGGTGAAGTACTGTATTTTCCTGATCGGTGGTGGC 
720 730 740 750 760 770 

770 780 790 800 810 820 

ATGCTACGCTCAACCTTGACACCAGCGTCTTCATCTCCACCTTCCTCGGCTAGCCAAAAC 

ATGCC ACACTCAATCTGG ACACCAGTGTCTTCATTTCTACCTTCCTTGGCTAGCC AGA - C 
780 790 800 810 820 830 

830 840 850 860 870 830 

AGCTCCCACGACTGCCGGTCACA-CACCAGCACGTCCCACC-TCGTGCTCACGGATTTTA 

AGGCAACTGGCAAGCC - - -CACTCCACCAGCACATCCCAATCTAGTCCTCACAGACTTTA 
840 850 860 870 880 890 

890 900 910 920 930 940 

TTACACAGATAGTGGCGGCAATGCCCTCAGCCCAGCCCACCCTCACCTGCTTTTCCAGCC 

TTACA -GGACAGTGGCAGCAGCAGCAAC - -CTCAGCCCACCCTCACCCACTCT -CCAGCC 
900 910 920 930 940 950 

950 960 970 980 990 

CACAAAGGGGCACGA TCACGGCCCAGCAAAACCGATGCTGAGAGGGGAAACAG 

CA-CAAGCGCGACAAGGGACGCTCATCCTCCACCAAGCCCTATCCTGAGAACGCCACCAC 
960 970 980 990 1000 

1000 1010 1020 1030 1040 1050 

TCCAGAGTCCAACAGCAGAACTTGGGGGAAGCGGTCGCCGTGGCCAGGAACATAAACTAT 

TTCACAACCCATCACCAGCCCC - G ATGGGGGC AGGC CCAGCCACACAAACTA? 

LOU 1020 1030 1040 L050 1060 
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1060 1070 1080 1090 1100 1110 

GTATAGGGGCCGGGGGCTTCTG - C - CC AGGGCTCCCCTGG ACC AGGACGCCAGGTAGGGC 

ACA GGGACTGGAGCTTCCGTCTCCAGATC-CTCCTGGGCCAGGGTGCCAGGCAGGAC 

1070 1080 1090 1100 1110 

1120 1130 1140 1150 1160 1170 

AGGGAACCTCAGTAGTCCTCCACCCAGCCATTCTCAGAGATGAATGCGTCAATAACCTCC 

ATGGGGCCTCAATAGTCCTCTACCCAGCCGTTCTCAGAGATGAAAGCGTCAATGACTTCC 
1120 1130 1140 1150 1160 1170 

1130 1190 1200 1210 1220 1230 

TTCATAGCCAAGTTGGGGATGAGCTGTTCCTGGGTCAGGGGGCTCCGGGTCACGGGGTCA 



TTCATGGCCAAGTTGGGGATGAGCTGTTCCTGGGTCAAAGGGCTCCGGGTCACAGGGTCA 
L180 1190 1200 1210 1220 1230 

1240 1250 1260 1270 1230 

AAATGACCC ACACGCTGCA - - - GTG ACAAG AAGGG - CAG AGGGCAGTCATGG - -GGCCCA 

AAGTGGCCCACACGCTGCAACAGAGTCAAGAGTGTTCAATGGCCTGAGTATACCGATCCG 
1240 1250 1260 1270 1280 1290 

1290 1300 1310 1320 1330 1340 

GG - ACCATGCCACT GGCCCTG -CTCCCCCAGCCGCAGGCCTCACCTGCAGGTGCTC 

GGTACCAAGGCTCTCCATGGCCCGGTCTCCATGGGCC-CT - - CCTTACCTGC AGGTGCTC 
L300 1310 1320 1330 1340 1350 

1350 1360 1370 1380 1390 1400 

CTCGATGTCCTTGCGGTCGTAGGTGATGCCACTGGGCGTGATGCACGGCTCCCGCATCAG 

CTCAATGTCCTTGCGGTCATAGGTGATACCACTGGGTGTAATGCAGGGTTCCCGCATCAG 
1360 1370 1380 1390 1400 1410 

1410 1420 1430 1440 1450 1460 

CTCAAAGCTGATCTTGCCACACAGGTAGTCGGGGATGTCTCGCTTCTGTGGCACACGGGC 



CTCAAAGCTAATCTTGCCACACAAGTAGTCAGGGATATCTCGCTTCTATAGCACAAGGGG 
1-120 1430 1440 1450 1460 1470 

1470 1480 1490 1500 1510 
AC ACGGTC AG AGGCTG AAAAGGGGC ACTGC ACG AGCACC - TGCC AGCC ATCGCC A 



AAAA7GTCTAGAACTGGAG-CGGGCTGTGGG -CGTCACCATACCAOC - AGCACCCCATGA 
I -ISO 1490 1500 1510 1520 15J0 

1520 1530 1540 1550 1560 1570 

GCAAGCGACACACACTCACCTTCCTCTTCTCATCCACCTGAGAAAA-\AGCTCGTCCATGT 



GCTTCCCGGGGTC-CTCACCTTTCTTTTCTCGTCCACCTGAGAGAAGAGCTCATCCATAT 
1540 1550 1560 1570 1580 1590 

•5*: 1590 1600 1610 1620 16J0 

CCICCA :gtacttgto:tgtgaaoacttcactcctgtgcttggggga GACACCCC 
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CTGCCATGTATTTATCCTG - - CAGAGTTG AGTGCCATGTGTGGGCAACTCCTGTCTCCAC 
1600 1610 1620 1630 1640 

1640 1650 1660 
AC CTCCC TCCTCCATGGGGCACA-GAC CCAAC A - — - C A - 



ACAGACACACACACTCTGTCCACCAGGGCACTCATGTCATGCATGGGCCAACAGATCCAC 
1650 1660 1670 1680 1690 1700 

1670 1680 1690 1700 1710 

- - -AGGCGGGGATGCT - - -C - - - CC ACGCCACGTGCACAC AC AC A - - GACCCACATGTGG 



C AAAGGC TGGGGCAC TTTTCATGCCACAC - ACAAAC ACACAC ACAATG ACCCACATGTGG 
1710 1720 1730 1740 1750 1760 

1720 1730 1740 1750 1760 1770 

GTGGGGGGCACCCTCACGTGCTTGGCCTCAATGCAGGCCTGCTGGGCCCGGACGTGGCTG 

ACTAGGGGCACCCTCACGTGCTTGGCCTCAATGCAGGCCTGCTGGGCCCGGATGTGGCCA 
1770 1780 1790 1800 1810 1820 

1780 1790 1300 1810 1820 1830 

TCGTCCTCATCACCCTCGTGGTTTCGCTGGCACTCTTCCAGCTCCCTGGGGGTTGACCAG 

TCATCTTCATGACCCTCGTGGTTCCGCTGACACTCCTCCAGTTCCCTGAGGGTTAACCAG 
1830 1840 1850 1860 1870 1880 

1840 1850 1860 1870 1880 

GAGCCGGTCAGAGATGGACCTGGCCAGATGT CTGACCACACCCCAATCTCAGA --GC 



AAGCTAGTTGGTGATGGCCCTGACCAGGAAATCACAGAGCCCGCCCCA-TCTCAGGCCTC 
1890 1900 1910 1920 1930 1940 

1890 1900 1910 

TAACATCCACA-CTTCCC CACATTT -C 



TTTCCTCCTGGGCTTCCCATGTACCGGTTGTTGTCCTTCAATAAAAACACTTGTGCTGGT 
1950 1960 1970 1980 1990 2000 

1920 1930 1940 
CTGCTTG -CCAGTAAAGC CTTCGATAAAC 



GACTCAGTGTCTGCTGGGGGACGGACCCACCTCTCTCGCTCAGCAGCAATGAGCCTGGTG 
2010 2020 2030 2040 2050 2060 

1950 L960 1970 
AAAAAAAAAAAAAAAAAAAACCCCCGCCC 



AGATATGAATGCAAAAAAAAAAAAAAAGGGCGGCCG 
2070 2030 2090 2100 
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10 20 30 40 50 60 

Jb^ftlW€ AATTCGGMWCMKXKGWGGVVCCCGGTGGAGTGAGAGGATGGGCGAGCAGTCTGAATGCC 

yol** O G - -TCGACCCACGCGTCCG- -GCTGGCGGAGCAGGAGGATGGGCGAGCAGTCTGAATGCC 
10 20 30 40 50 

70 80 90 100 110 120 

AGAATGGATAACCGTTTTGCTACTGCGTTTGTGATTGCTTGTGTGCTTAGTCTGATTTCC 

AGAATGGATAACCGTTTTGCTACAGCATTTGTAATTGCTTGTGTGCTTAGCCTCATTTCC 
60 70 80 90 100 110 

130 140 150 160 170 180 

ACCATCTACATGGCGGCCTCCATAGGCACGGACTTCTGGTATGAGTATCGAAGTCCCATT 

ACCATCTACATGGCAGCCTCCATTGGCACAGACTTCTGGTATGAATATCGAAGTCCAGTT 
120 130 140 150 160 170 

190 200 210 220 230 240 

CAAGAGAATTCAAGTGACTCGAATAAAATCGCCTGGGAAGATTTCCTCGGTGACGAGGCG 

CAAGAAAATTCCAGTGATTTGAATAAAAGCATCTGGGATGAATTCATTAGTGATGAGGCA 
180 190 200 210 220 230 

250 260 270 280 290 300 

GATGAGAAGACTTACAACGATGTTCTGTTCCGATACAACGGCAGCTTGGGGCTGTGGAGA 

GATGAAAAGACTTATAATGATGCACTTTTTCGATACAATGGCACAGTGGGATTGTGGAGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

CGGTCCATCACCATACCCAAAAACACTCACTGGTATGCGCCACCGGAAACGACAGAGTCA 

CGGTGTATCACCATACCCAAAAACATGCATTGGTATAGCCCACCAGAAAGGACAGACTCA 
300 310 320 330 340 350 

370 380 390 400 410 420 

TTTGATGTGGTTACCAAATGCATCACTTTCACACTAAACGAGCAGTTCATGGAGAAGTAT 

TTTGATGTGGTCACAAAATGTGTGAGTTTCACACT.^ACTGAGCAGTTCATGGAGAA.MTT 
360 370 380 390 400 410 

430 440 450 460 470 430 

GTGGACCCCGGCAACCACAATAGCGGCATCGACCTGCTTCGCACCTACCTGTGGCGCTGC 

GTTGATCCCGGAAACCACAATAGCGGGATTGATCTCCTTAGGACCTATCTTTGGCGTTGC 
420 430 440 450 460 470 

490 500 510 520 530 540 

CAGTTCCTTTTACCCTTCGTCAGCTTGCCCTTGATGTGCTTTGGGGCGTTCATTGCCCTC 
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CAGTTCCTTTTACCTTTTGTGAGTTTAGGTTTGATGTGCTTTGGGGCTTTGATCGGACTT 
480 490 500 510 520 530 

550 560 570 580 590 600 

TGTGCCTGTATCTGCCGCAGCCTGTATCCCACCCTCGCCACTGGCATTCTCCATCTCCTT 

TGTGCTTGCATTTGCCGAAGCTTATATCCCACCATTGCCACGGGCATTCTCCATCTCCTT 
540 550 560 570 580 590 

610 620 630 640 650 

GCAGGTCTGTGCACA CTGGGCTCCGTGAGTTGCTATGTTG- -C- -CGGCATTGA- - 



GCAGGAAATTACTCAGATTCTTGGCTCCATGAATAATTTTAATGATCTTCTACATTATCC 
600 610 620 630 640 650 

660 670 
A CTC TTACATC AGAAAGTAG- - 



TTGATAATTACTCATTTCTCAATAATCTTTTAATTTCATCCCATGACTCTGAGGATAGCT 
660 670 680 690 700 710 

680 690 
AGCT GCC CAAGG ATGT ATCTGG 



TCCAAGCTCTTTAAATGGCCTTACAAACTCATTGGCAAGTTCTATACTTCAGGCACACTG 
720 730 740 750 760 770 

700 

AG AATTT - - GG ATGGT C 



ACCTTTTAGTTTTTCCAGTGGGCCATGCCTATGGTAGTTTAAAAACATGGGCTTAAAATC 
730 790 800 810 820 830 

710 

CTTC TGC ; - CTGGC 



CTTCCATCAATCTTCCATTCACATTCCCATCCCCTTGAATCTACCCTGGCTTGTGATCCT 
940 850 860 870 880 890 

720 730 
CTG-- - CGTCTC GGC TC 



TTTCACCAATAGAGTGTGCCTGAAATGACACTCTTCTCATGAGGTCCTAAAGATCATGTG 
900 910 920 930 940 950 

740 

-CCTTA - - -CAGTTC 



TCCTTAAACCAGTTCTCTTGGAACACTCACTCTTAGAACATTCCCTCTCCAAACCCAGAT 
960 970 980 990 1000 1010 

750 760 
ATCGC - - GGCCCCT - - CT CTTC ATCTG 



ACCATGCTGTGAAGTCCACGCCACATGGAGGTGTCCTGTGTAGATGCTCCAGCTGAAATC 
1023 10J0 L040 1050 1060 1070 
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770 780 790 
• GGCTGCCCACA CCAACCG - GAAAG AGTAC 



CCAAGCTAAGCTCCCAACTGACAGCCAACATCATTTCCAGCCATGTGTGGGAGCCATCCT 
1080 1090 1100 1110 1120 1130 

800 810 
ACCTTAA TGAAGGCTT ATC 

GGATGTCCAGCCTTAACAAGCCTTCAGAGGACTTCAGCCACAGCTATTATCTTACTACAT 
1140 1150 1160 1170 1180 1190 

820 830 840 
GTGTGGC ATGAAGGG AGGCTG CCTG CT 



CCTTGTGAGACTCTAATAAAGAACCAACTAGCTGAGCCCAATCAACCTATGGAACTGATA 
1200 1210 1220 1230 1240 1250 

850 860 870 
TAATGATTAATATTTTT - C ATACATTTTTTT 



GAAATAAAATGAATTGTTGTTTTGTGCCGCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAG 
1260 1270 1280 1290 1300 1310 



GGCCCCCCC 
1320 
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10 20 30 40 50 

HJMAU GTCGACCCACGCGTCCGGCGGCTAGGCCCGCGTGCGCTGGAGACCTCCGCGCTGGCCCC- 

: : : . . : . A : v: : ; ; : : ; : : : 

NOftlNf TCCG -GTCCAN-GAAAAAGCT -GCTTGCACTAGGGGCATCC -CGCCTGCCTGG 

10 20 30 40 

60 70 80 90 100 110 

. CGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGCATGGGTGGCC 

TGAAAGGAACCG- - CAGCAC ACAGGGTGGG AGGGCTTCCG- -ATTTTAGC A- GGGCGGCT 
50 60 70 80 90 100 

120 130 140 150 160 170 

CCCGGCGCGCCG - GCTGGGTGGCGGCGGG - CCTGCTGCTCGGCGCGGGCGCCTGC - -TAC 

TCCGG AAGGCGG AGCTC - - C AACCCCATTTCCT - - TTCTCTGGGCTGGTTCTGGCCC AGC 
110 120 130 140 150 160 

180 190 200 210 220 230 

TGCATTTAC AGGCTGACCCGGGGTCGGCGGCGGGGCG ACCGCGAGCTCGGG ATACGCT - C 

TGCACCTGCGTG - TCGCCCTGGCTCCTCGGCT C -CCTGC - AGCTCCGAGGCAGCAGC 

170 180 190 200 210 

240 250 260 270 280 290 

TTCGAAGTC -CGCAGGTGCCCTGGAAGAAGGGACGTCAGAG- -GGTCAGTTGTGCGGGCG 

ATGGGTCCCC.CCCGGGA - -CGTCCGCTCGCTGGCACCAGCCCTCCTCCTCCGCCCCCCCG 
220 230 240 250 260 270 

300 3 10 320 330 340 

CTCGGC - - C CGGCCT - C AG ACGGG AGGT ACCTGGG AGTC AC AGTG -GTCCAAG-A 

C-CTGCTACTGTATCTACCGGCTGACTCCCXjC-ACCCCGCCGAGGCGTCGCGACCA7GCG 
2S0 290 300 310 320 330 

350 360 370 330 390 

CC - -TCGCAC-CC - -TGAAGACTTAACTGATGGTTCATATGATGATGTTCTAAATCCTGA 
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CCCTTCGCGATCCGCAGAAGACCTAACCGATGGCTCCTATGACGATATCTTAAATGCAGA 
340 350 360 370 380 390 

400 410 420 430 440 450 

ACAACTTCAGAAACTCCTTTACCTGCTGGAGTCAACGGAGGATCCTGTAATTATTGAAAG 

GCAGCTTAAGAAACTTCTGTATCTGCTGGAGTCAACCGACGATCCTGTCATTACTGAAAA 
400 410 420 430 440 450 

460 470 430 490 500 510 

. AGCTTTGATTACTTTGGGTAACAATGCAGCCTTTTCAGTTAACCAAGCTATTATTCGTGA 

GGCCTTGGTCACCTTGGGAAATAATGCAGCCTTCTCCACTAACCAGGCCATTATTCGTGA 
460 470 430 490 500 510 

520 530 540 550 560 570 

ATTGGGTGGTATTCCAATTGTTGCAAACAAAATCAACCATTCC - - AACCAGAGTATTAAA 

GTTGGGTGGTATCCCAATTGTTGG AAACAAAATCAAC - - TCCCTG AACCAAAGTATTAAA 
520 530 540 550 560 

580 590 600 610 620 630 

GAGAAAGCTTTAAATGCACTAAATAACCTGAGTGTGAATGTTGAAAATCAAATCAAGATA 



GAGAAAGCTTTAAATGCACTGAATAACCTGAGTGTGAATGTTGAAAATCAAACTAAGATA 
570 580 590 600 610 620 

640 650 660 670 680 690 

AAGATATACATCAGTCAAGTATGTGAGGATGTCTTCTCTGGTCCTCTGAACTCTGCTGTG 

:::::: :::::::: : : : : : : :X. 
AAGATATACGTCCCTCAAGTCTGTGAGGACGTCTTTGCTGAC 
630 640 650 660 670 

700 710 720 730 740 750 

CAGCTGGCTGGACTGACATTGTTGACAAACATGACTGTTACCAATGACCACCAGCACATG 
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T182 . hum. pep MMMTQARVL VAAWGL VAVLL YPJS IHK I EEGHLA VYYRGGALLTS PSGPGYHIMLPF ITTFRSVQT 

T132 . mus . pep MNMTQARLL VAAWGL VA ILL Y.^S IHK IEEGHLA VYYRGGALLTS PSGPGYHIMLPFITTFRSVQT 

T131 . hum. pep MAQLGA WA VAS S FFCASL FSPjVHK-I EEGHIGVYYRGGALLTSTSG PGFHLML PF ITS YKSVQT 

Tl 8 1 . mus . pep MAQLGA WA VAS S FFCAS L FSPiVHK I EEGHIGVYYRGGAL LTSTSGPGFHLMLPF ITSYKSVOT 



T132.hum.pep TLQTDEVKNVFCGTSGGV*^^ 

T182 . mus . pep TLQTDEVKEJVPOCTSGGVMIYTDRISWNML^ 

T13 1 . hum . pep TLQTDE^/KWPCGTSGGVTCIYFDRIH7/VNFL 

Tl 3 1 . mus . pep TLQTOEVKNVFCGTSGGVmYFDRIEW^^ 



Tl 32 . hum . pep KTLQ EVYIELFDQ IDEDJLKQALQKDLNLMAPGLTIQAVRVTKPKI PEAIRRNFEMEAEKTKLLIA 

T132 .mus . pep OTLQEVYIELFCQIDEMiKQALQKD^ 

T1S1 . hum. pep KTLQEVYIELFCQ IDENLKIALQQDLTS&APGL VIQA VRVTKPNI PEAIRRNYEIJ'IESEKTKLLIA 

T13 1 . mus . pep KTLQEVYTELFCQ IDDJLKIALQQDLTSMAFGL VIQAVR VTKPMT P EAIRKNYELMESEKTKLLIA 



Tl 8 2 . hum . pep XC KQKWEKEAETERKKA VI EA£K I AQ VAK I RFQQKVMEKETEKR I S EI EDAAFLAKEKAKADAEY 

T182 . mus . pep AQ KQKWEKELAETERKRA V I EAEK I AQ VAK IRFQQ KVMEKETEKR I S EIEDAAFLAREKAKADAEY 

T131 . hum. pep AQKQKWEKEAETERKKAL I EAEKVAQ VAEITYGQKVMEKETEKKISEI EDAAFLAREKAKADAEC 

T13 1 . mus . pep AQKQKVVEKEAETERKKAL I EAEKVAQ VAEITYGQKVMEKETEK . . . 



T132 . hum.pep Y AAHKYATSNKHKLTPEYL EL KKYQA IASNSKI YFGSNI FNMFVDSSCALKYSDIRTGRESSL PSK 

T132 . mus . pep YAAHKYATSNKHKLTPEYXELKKYQAIASNSKI YFGSNI PSMFVDSSCALKYSCGRTGREDSL P PE 

T131 . hum. pep YTAMKIAEANKLKLTPEYLQLMKYKAL^ — SKQFEGLADK 

C42C1 . a \1<AQKQADSNKILLTKEYLELQKIRAIASNNKI YYGDS I PQAFV- -MGTTQQTV 



T132 . hum. pep EALEPSGQJVIQ- -NKESTG 
TI32 . mus . pep EAREPSCESPIQ- - NKQ/AC 
T13 I . hum. pep LSFGLE - DEPLETATKEN 
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10 20 30 40 50 60 

inputs MATLWGGLLRLGSLLSLSCLALSVLLLAQLSDAAKNFEDVRCKCICPPYKENSGHIYNKN 

MK LLSLVAW- -GCL LVPPAEANKSSED IRCKCICPPYRNISGHIYNQN 

10 20 30 40 

70 80 90 100 110 120 

inputs ISQKDCDCLKWEPMPVRGPDVEAYCLRCECKYEERSSVTIKVTI I IYLSILGLLLLYMV 

VSOKDCNCLHWEPMPVPGHDVEAYCLLCECR YEERSTTTI KVI IVT YLS WGALU* YMA 
50 60 70 80 90 100 

130 140 150 160 170 180 

inputs YLTLVEPrLKRRLFGHAQLIQSDDDIGDHQPFANAHDVLARSRSRANVLNJCVEYAQQRWK 

FLMLVDP - LIRKPDAYTEOLHNEEENEDARSMAAAAASLGGPRA-NTVLERVEGAQORWK 
110 120 130 140 150 160 

190 

inputs LQVQEQRKSVFDRHWLSN 



LQVQEORKTVFDRHKMLSN 
170 180 
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10 20 30 40 50 60 

inputs ^SLWCGNLLRLGSGLSMSCLALSVLLIAQLTGAAKNFEDVRCKCICPPYKEOTGHIYNK 

M KLLCLVAW- -GCL LVPPAQANKSSEDIRCKCICPPYRNISGHIYNQ 

10 20 30 40 

70 80 90 100 110 120 

inputs NISQKDCDCLHVVEPMPVRGPDVEAYCLRCECKYEERSSVTIKVTIIIYLSILGLLLLYM 

NVSQKDCNCLHWEPMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVIYLSWGALLLYM 
50 60 70 80 90 100 

130 140 ISO 160 170 180 

inputs VYLTLVEPILKRRLFGHSQLLQSDDDVGDHQPFANAHDVLARSRSRANVLNKVEYAQQRW 

AFLMLVDP - LIRKPDAYTEQLHNEEENEDARTMATAAASIGGPRA-NTVLERVEGAQORW 
110 120 130 .140 150 160 

190 200 
inputs KLQVQEQRKSVFDRHWLSN 

KLQVQEQRKTVFDRHKMLSN 
170 180 
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Input file T187human1; Output File Tl87human1 .pat 
Sequence length 2490 

CCACGCCTCCCGCCAGGGCCGGGAGGGACCAATGGTTGCTTCACCCCCCGCCGCAACACACCGGAAGCTCCGCTCTGGG 79 

TTGCGGGCCCCGGCGTCTCCGCGTGGGCCGCACCGTCCCACCCGCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCCGGACGCGCCGCGCTGGAGTGGGCGGTTATAGGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCGGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTCGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPRGAGWVAACLltGAGACY 22 

GGC CCC CGG CGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRITRGRRRGORELGIRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 



SKSAGALEEGTSEGOL-CGRS 62 
TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



ARPaTGGTWESOUSJCTSxPE 82 
GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAN CCT GAA 631 

0 L TO G S rOOVLNAEQLQ'KLL 102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

Y l L E S TEOPV I I ERAL I TIG 122 
TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

N M A A F S V N Q A I IRELGGIP] 142 
AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT 811 

V A N K I MHSNOS I KEKALNAL 162 
GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA 871 

NNLSVMVENOIKIKVQVLKL182 
AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG 931 



LLMLSENPAHTEGLLRAQVD202 
CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT 991 



SSFISLY0SHVAKEILLRVL222 
TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT CTT CGA GTA CTT 1051 

rLFQNIKNCLICIEGHI.AVOP242 
ACG CTA TTT CAG AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT 1111 

TFT6GSLFFLLHGEECAOJCI262 
ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA CAT GGA CAA GAA TGT GCC CAG AAA ATA })7} 

RAIVOHHDAEVKEKVVTI IP 282 
AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC 1231 

* 1 * 285 
AAA ATC TGA 1240 

TrGGTCATATTrrTCCAAAGAGTAATGCAGTCTGCATATAAArGTArTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1319 

C TGC TAAA r T TAAA CAG TAAATATCACATT TTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGAT TTA TTTTGG 1398 

ACTA TTT TGA TGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1477 

GTTATCTTCCCTACArCAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTCAAGTCATTTGCAGTT 1556 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1635 

ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAAfCTAT 1T14 
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TTTCGTCACTTCTACTCAATCAAAAArGTAAACTTTTAGCACAGAATCTTTCCTACCACTCACCCACTCCATTCAATCT 1793 
TACATATAAAATAGTGTGATCAATCACAATGTCCArCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1872 
CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCACCACTTTCGGAGGCTGAGGCGGGCAGATCACCTGAGArCGGGA 1951 
GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2030 
GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGGTGAG 2109 
ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2188 
TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGT7TTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2267 
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGrGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2346 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGCAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2425 
AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2490 
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Cotanlnput file T187hunian23; Output File T187human23.pat 
Sequence length 2595 

CCACGCCTCCCCCCAGCCGCGGGAGCGAGCAATGGTTCCTTCACGCCCCCGGGCAAGAGACGGCAAGCTCCCCTCTGCC 79 

TTGCGGGCCCCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCGCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCGCCGGGCTGGAGTGGGCGGTTATACGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCGGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPRGAGWVAAGLLLGAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAG 451 

C1YRLTRGRRRG0RELGIRS 42 
TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CCC TCT 511 

5KSAE0L TOG SYDOVL N A E Q 62 
TCG AAG TCC GCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LOKLLYLL6STE0PVIIERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT 631 

LI TLGNNAAFSVNQtPMJCLV 102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC 691 

TGI TFA I I RELGG IP I V A N K 122 
ACT GGC ATC ACA TTC GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA 751 

lNHSNaSIKEJCAlNAlNNlS142 
ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG ACT 811 

VMVENOIKIKIYISQVCEDV 162 
GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT GTC 871 

FSGPLMSAVQLAGLTLLTMM182 
TTC TCT GGT CCT CTG AAC TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG 931 

TVTNDHQHMLHSYl TDLFOV 202 
ACT GTT ACC AAT GAC CAC CAG CAC ATG CTT CAC AGT TAC ATT ACA GAC CTG TTC CAG GTG 991 



LLTGNGNTKVQVLKLLLNLS222 
KTA CTT ACT GGA AAT GGA AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT 1051 



EWPAM FECI LRAOVOSSFLS 242 
GAA AAT CCA GCC ATC ACA GAA GGA CTT CTC CCT GCC CAA GTG GAT TCA TCA TTC CTT TYC 1111 

LYDSHVAKE I LLRVLTLFQN 262 
CTT TAT GAC AGC CAC GTA GCA AAG GAC ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT 1171 

1KNCIKI EGHIAVQPTFTEG 282 

ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT 1231 

SLFFLLHGEECAOICIRAIVD 302 

TCA TTG TTT TTC CTG TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT 1291 

HHOAEVKEKVVTl IPKI • 320 

CAC CAT GAT GCA GAG GTG AAG CAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TCA 1345 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1424 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1503 

ACTATTTTGArGCCAAGTCAATATAAGAGCTTCTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1532 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTGATTTGCAGTT 1661 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTCTTCTTTTAGTAGCAATGAA 1740 

ArCCTAAGCTCTTGAGGCCATTCACCTCCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1819 
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TTTGGTCACTTCTACTCAATCAAAAATGTAAACTTTTAGCAGAGAATGTTTCCTACGACTCACCCACTCCATTCAATGT 1898 
TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1977 
CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCACCACTTTGGGACGCTGAGGCGCGCAGATCACCTGAGATCGGGA 2056 
GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGCCATGGTGGTGCAT 2135 
GCCrGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGCGAGGCAGAGGTTGCAGTGAGCTGAG 22H 
ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGACCAAAACTC7GTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2293 
TGTGCTTAAGTGGAAAGArATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2372 
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2451 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGCAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2530 
AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2595 
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Input file T187human123; Output File Tl87human123.pat 
Sequence length 2700 

CCACGCGTCCGGCCACGGGCGCGAGGGAGGAATCGTTCCTTCACGCCCCGCGGGAAGAGACGGGAAGCTCGGCTCTCGG 79 

TTGCCGGCCCCGCCGTCTCCGCGTGGGGCGCACCGTCCGACCCGCCCCTCCCGGTGTGCAGCGCCCCCCACCCCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCCCCCGGCTGGAGTGGGCGGTTATAGGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACArCCAGGTCAGGTGGCGTTTGCTGTGGCGGCTAGGCCCGCGTGCGCTGC 316 

M G 2 

AGACCTCCGCCCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCCGCGGCAGC ATG GGT 391 

GPRGAGUVAAGL LLGAGACY 22 

GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAG 451 

CJYRITRGRRRGORELGIRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 



SKSAGAlEEGTSEGQLCGRS 62 
TCG AAG TCC CCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



ARPOTGGTUESOWSKTSOPE 82 
GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAN CCT GAA 631 

OLTOGSYOOVLNAEOLQICLL 102 
GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

Y l I E STEDPVI IERAL ! TIG 122 
TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

NNAAFSVNOIPMKLVTGITF 142 
AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC ACT GGC ATC ACA TTC 811 

A! IRELGGIPIVANKINHSN 162 
CCT ATT ATT CCT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC 871 

QS [KEKAINAINNLSVNVEN 182 
CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT 931 

QIICIKIYtSOVCEOVFSGPL 202 
CAA ATC AAG ATA AAG ATA TAC ATC AGT CAA GTA TGT GAG GAT GTC TTC TCT GGT CCT CTG 991 

NSAVOLAGLTILTNHTVTNO 222 
AAC TCT GCT GTG CAG CTG GCT GGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC 1051 

HOHMLHSY I TOLFQVLITGN 242 
CAC CAG CAC ATG CTT CAC AGT TAC ATT ACA GAC CTG TTC CAG GTG KTA CTT ACT GGA AAT 1111 

GNTKVOVLKllLNtSENPAM 262 
GGA AAC ACG AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG 1171 

TEGL LRAOVO SSFLSIYDSH 282 
ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC 1231 

V A K E I ILRVlTLFQNtKNCL 302 
GTA GCA AAG GAG ATT CTT CTT CCA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC CTC 1291 

KIEGHLAVOPTFTEGSLFFL 322 
AAA ATA GAA CGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG 1351 

IHGEECAQK I RALVOHHOAE 342 
TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG Kit 

VKEJCVVTIlPKt* 355 
GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1450 

rTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1529 

CTGCTAAATTTAAACAGTAAATATCACATTTTCTCATTAACACACCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1608 

ACTA TTT rGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAArGCTT 1687 
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GTTATCTTCCCTACATGMCTCGCACTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAACTGATTTCCAGTT 1766 
ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1845 
ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1924 
TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTCCTAGGACTCACCCACTCCATTCAATGT 2003 
TACATATAAAATAGTGTGATCAATCACAArGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 2082 
CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 2161 
GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2240 
GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGGTGAG 2319 
ArAGCGCCATTCCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2398 
TGTGCTTAAGTGCAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2477 
rGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2556 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2635 
AATATGAGCCCAAATTGTATAATCTTTTTTTAArAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2700 




WO 00/18904 



PCT/US99/22817 



97/112 

Input file T187human12; Output file T187human12.pat 
Sequence length 2523 

CCACCCCrCCCGCCAGCGGCGGGAGCGAGCAATGGTTGCTTCACGCCCCGGGGGAAGAGACGGGAAGCTCGGCTCTGCG 79 

TTGCGGCCCCCCGCGTCTCCGCGTGGGGCGCACCGTCCCACCCGCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCGCCGGGCTGGAGTGGGCGGTTATAGGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCGGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPRGAGWVAAGLLLGAGACY 22 

GGC CCC CGG GGC GCG GGC TGG GIG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRLTRGRRRGDRELG IRS 42 

TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CCC GAG CTC GGG ATA CGC TCT 511 



SKSAGAIEEGTSEGQICGRS 62 
TCG AAG TCC GCA GGT GCC CTG GAA GAA GGG ACG TCA GAG GGT CAN TTG TGC GGG CGC TCG 571 



A R P Q TGG TUESQUStCTSxPE 82 

GCC CGG CCT CAG ACN GGA GGT ACC TGG GAG TCA CAG TGG TCC AAG ACC TCG CAM CCT GAA 631 

OITOGSYODVINAEQIQKIL 102 

GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA CTT CAG AAA CTC CTT 691 

YLLESTEOPVI IERAU I TLG 122 

TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT TTG ATT ACT TTG GGT 751 

NNAAFSVNOIPNKLVTG ITF 142 

AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC ACT GGC ATC ACA TTC 811 

AI IRELGGIPIVANKINHSN 162 

GCT ATT ATT CGT GAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC 871 

Q S I K £ < A LNALNNL SVN VEN 182 

CAG ACT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT 931 



OJKI KVQVLKLLINLSENPA 202 
CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC 991 



MTEGLLRAOVOSSFISIYDS 222 
ATG ACA GAA GGA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT TAT GAC AGC 1051 

HVAKE I L LRVITLFQNIKNC 242 
CAC GTA GCA AAG GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC 1111 

IK I E G H I A VO.P T F T EGSL F F 262 
CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC 1171 

LLHGEECAOK IRAIVOHHOA 282 
CTG TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA 1231 

EVKEICVVTIIPKI* 296 
GAG GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1273 

T TGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGCGATTCTCCCAG 1352 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTCGTTCTCAGATTTATTTTGG 1431 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1510 

GTTATCTTCCCTACATGAAGTGCCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTGATTTGCAGTT 1589 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1668 

ArcCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTCCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1747 
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TTTCGTCACTTCTAGTCAATGAAAAATGTAAACTTTTACGAGAGAATCTTTCCTACGACTCACCCACTCCATTCAATGT 1826 

TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1905 

CCGTGCTGGGCGCGGTGGCTCTTGCCTGTAATCCCAGCACrTTGGGAGGCTGAGGCGGCCAGATCACCTGAGATCGGGA 1984 

GTTTGAGACCAACCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCAT 2063 

GCCTGTAATCCCAGCTACTTGGGACGCCGAGGCAGGAGAArTGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGGTGAG 2142 

ATAGCGCCATTG CA CT CCA GCC TG GG CAA CAA GA GCAAAAC T CT G T C T CAAAAA A A AAAAA A AA T G A TGGA G C T CCGAA 2221 

rGTGCTTAAGTGGAAAGATATCTATGAAATATGGTCGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2300 

TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2379 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2458 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAACGGGAGAAAAATCAAAAAAAAAAAAAAA 2523 
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Input file T187human2; Output file Thuman2.pat 
Sequence length 2418 

CCACCCCTCCGGCCAGGGGCGGGAGGGAGCAATCCTTGCTTCACGCCCCCGCCGAAGAGACGGGAAGCTCGGCTCTCGC 79 

TTGCGGGCCCCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCGCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACCCCCCGGGCTGGAGTGGGCGGTTATAGGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCGCCTAGGCCCGCGTGCCCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG CGT 391 

GPRGAGWVAAGL L LGAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRLTRGRRRGORELGIRS 42 
TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 

SXSAEOLTOGSYDDVLNAEO 62 
TCG AAG TCC GCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LQKLLYILESTEDPVI I E R A 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT 631 

I I TLGMNAAFSVNQ! PMKLV 102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA ATC CCT ATG AAG TTG GTC 691 

T G i T F A I I R E L G G I P I V A H K 122 
ACT GGC ATC ACA TTC CCT ATT ATT CGT CAA TTG GGT GGT ATT CCA ATT GTT GCA AAC AAA 751 

[HHSNQSIKEJCALNALNNIS142 
ATC AAC CAT TCC AAC CAG AGT ATT AAA GAG AAA GCT TTA AAT GCA CTA AAT AAC CTG AGT 811 

VNVENQ!KIICV'0VLICLLLNL162 
GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG GTG CAA GTT TTG AAA CTG CTT TTG AAT TTG 871 



SEMPAMTEGIIRAQV0SSFL182 
NCT GAA AAT CCA GCC ATG ACA GAA GCA CTT CTC CGT GCC CAA GTG GAT TCA TCA TTC CTT 931 



SlYDSHVAKE I LLRVLTLFQ 202 
TYC CTT TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT CTT CGA GTA CTT ACG CTA TTT CAG 991 

NIKNCLKIEGHLAV0PTFTE222 
AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT TTA GCT GTG CAG CCT ACT TTC ACT GAA 1051 

GSIF FLIHGEECAQK IRALV 242 
GGT TCA TTG TTT TTC CTG TTA CAT GGA GAA GAA TGT GCC CAG AAA ATA AGA GCT TTA GTT 1111 

OHHDAEVKEKVVTI IPKl* 261 
GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT GTA ACA ATA ATA CCC AAA ATC TGA 1168 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1247 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1326 

ACTATTTTGATGCCAAGTGAATATAAGACCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1405 

GTTATCTTCCCTACATCAAGTGCCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTCATTTGCAGTT 1484 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1563 

ATCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1642 

rTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGAGAATGTTTCCTAGGACTCACCCACTCCATTCAATGT 1721 

TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1800 

CCCTCCTGGGCGCGGTGGCTCTTGCCTGTAArCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 1879 

CTTTGAGACCAAGCCTCACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGCCCATGGTGGTGCAr 1958 
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GCCTCTAATCCCAGCTACTTCGCAGGCCGACGCACCAGAATTGCTTGAACCCCGCACGCACAGGTTCCAGTCACGTCAG 2037 

ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2116 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2195 

TGTGTGTGTGTGTGTGTGTGTGTGT6TCTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2274 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2353 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGCAGAAAAATCAAAAAAAAAAAAAAA 2418 
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Inpot file n87human3; Output File T187human3.pat 
Sequence length 2562 

CCACGCGTCCCGCCAGGCCCCGGAGCGACGAATGCTTGCTTCACGCCCCCCCGCAAGACACGGGAACCTCCCCTCTGCG 79 

TTGCGGGCCCCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCGCCGGGCTGGAGTGGGCGGTTATACGCTTTGAGCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCGGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPR GAGWVAAGLLLGAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRLTRGRRRGDREIGIRS 42 
TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 

SKSAEDLTDGSYODVINAEQ 62 
TCG AAG TCC GCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT GCT GAA CAA 571 

LQKLLYLLESTEDPVIIERA 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG GAT CCT GTA ATT ATT GAA AGA GCT 631 

LITLGNNAAFSVNGAI IREL102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG 691 

GGIP IVANKINHSNQSIKEK 122 
GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG ACT ATT AAA GAG AAA 751 

ALNAINNLSVNVENQIKIJCI 142 
GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG ATA 811 

YISQVCEDVPSGPLNSAVQl 162 
TAC ATC AGT CAA GTA TGT GAG GAT GTC TTC TCT GGT CCT CTG AAC TCT GCT GTG CAG CTG 871 

AGL T 11 TMMTVTNDHOHMLH 182 
GCT CGA CTG ACA TTG TTG ACA AAC ATG ACT GTT ACC AAT GAC CAC CAG CAC ATG CTT CAC 931 



SYI TDLFOVLITGMGNTKVQ 202 
AGT TAC ATT ACA GAC CTG TTC CAG GTG KTA CTT ACT GGA AAT GGA AAC ACG AAG GTG CAA 991 



VtKL LLNLSENPAMTEGLLR 222 
GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC CGT 1051 



AQVOSSFLSL 
GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT 

LRVITLFONI 
CTT CCA GTA CTT ACG CTA TTT CAG AAT ATA 

AVQPTFTEGS 
GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA 

A O KI RALVOH 
GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC 

r i i p k i * 

ACA ATA ATA CCC AAA ATC TGA 



YOSHVAKEIL242 
TAT GAC AGC CAC GTA GCA AAG GAG ATT CTT 1111 

KNCLKIEGHC 262 
AAG AAC TGC CTC AAA ATA GAA GCC CAT TTA 1171 

IF FllHGEEC 282 
TTG TTT TTC CTG TTA CAT GGA GAA GAA TGT 1231 

HOAEVK6KVV 302 
CAT GAT GCA GAG GTG AAG GAA AAG GTT GTA 1291 

309 
1312 



TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGATATAAATGTATTTTCTGTCTTCCTTATAAGGGGATTCTCCCAG 1391 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTGCCGTGGTTCTCAGATTTATTTTGG 1470 

ACTA TTTTCATGCCAAGTGAA TAT AAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1549 

GTTATCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTCAAGTGATTTGCAGTT 1628 

ACTCATCTGAGACAGCATCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1707 
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ATCCTAACCTCTTCACGCCATTCACCTGCCAACCTCACCATACTCCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1786 
TTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTAGGAGACAATGTTTCCTAGGACTCACCCACTCCATTCAATGT 1865 
TACATATAAAATAGTGTGATCAATCACAATGTCCATCTTTAGACAGTTGGTTAAATAAATTATCTGGTCTTTGAAAAGA 1944 
CCGTGCTGGGCGCCGTGGCTCTTGCCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGCAGATCACCTGAGATCGGGA 2023 
GTTTGAGACCAAGCCTGACCAATATGGAGAAACCCTGTCTCTACTAAGAATACAAAATTAGCTGGGCATGGTGGTGCA7 2102 
GCCTGTAATCCCAGCTACTTGGGAGGCCGAGGCAGGAGAATTGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGGTGAG 2181 
ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCGAA 2260 
TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2339 
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2418 
CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2497 
AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2562 
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Input file Tl87hunan; Output File T187human.pat 
Sequence length 2385 

CCACCCGTCCCGCCAGGCCCGGGAGGGAGCAATCCTTGCTTCACGCCCCCGGCGAAGACACGGGAAGCTCGCCTCTGCG 79 

TTGCGGGCCCCGGCGTCTCCGCGTGGGGCCCACCGTCCGACCCGCCCCTCCCGGTGTGCAGCGCCCCGCACCGCCCCGC 158 

CTCGCCTGGGAGAAGCCGCCGGGACGCGCCGGGCTCGAGTGGGCGGTTATAGGCTTTGACCTAGGCCGTTTCCGGGAGG 237 

CGGAGCTCAGACCCCATTTCCTTTCTCCACATCCAGGTCAGGTGGCGTTTGCTGTGGCCGCTAGGCCCGCGTGCGCTGG 316 

M G 2 

AGACCTCCGCGCTGGCCCCCGCGAGCCTCCTGCCCTGGCCCGGCGCTGCGGCTCTGCCGCGGCGGCAGC ATG GGT 391 

GPRGAGWVAAGLLIGAGACY 22 
GGC CCC CGG GGC GCG GGC TGG GTG GCG GCG GGC CTG CTG CTC GGC GCG GGC GCC TGC TAC 451 

CIYRLTRGRRRGDRELGIRS 42 
TGC ATT TAC AGG CTG ACC CGG GGT CGG CGG CGG GGC GAC CGC GAG CTC GGG ATA CGC TCT 511 

SKSAEDlTOGSYDOVL N A E Q 62 
TCG AAG TCC GCA GAA GAC TTA ACT GAT GGT TCA TAT GAT GAT GTT CTA AAT CCT GAA CAA 571 

LQKLIYLLESTEDPVI I E R A 82 
CTT CAG AAA CTC CTT TAC CTG CTG GAG TCA ACG GAG CAT CCT GTA ATT ATT GAA AGA GCT 631 

LITLGNNAAFSVMOAI IREL102 
TTG ATT ACT TTG GGT AAC AAT GCA GCC TTT TCA GTT AAC CAA GCT ATT ATT CGT GAA TTG 691 

GG I P I VANK I MH SMOS I KEJC 122 
GGT GGT ATT CCA ATT GTT GCA AAC AAA ATC AAC CAT TCC AAC CAG AGT ATT AAA GAG AAA 751 

ALNALNMLSVNV6NQIKIKV142 
GCT TTA AAT GCA CTA AAT AAC CTG AGT GTG AAT GTT GAA AAT CAA ATC AAG ATA AAG GTG 811 



QVIKLLINLSENPAMTEGLL 162 
CAA GTT TTG AAA CTG CTT TTG AAT TTG NCT GAA AAT CCA GCC ATG ACA GAA GGA CTT CTC 871 



RAQVDSSFLSLY0SHVAICE:ia2 
CGT GCC CAA GTG GAT TCA TCA TTC CTT TYC CTT TAT GAC AGC CAC GTA CCA AAG GAG ATT 931 

LLRVLTLFONIKNCLKIEGH 202 
CTT CTT CGA GTA CTT ACG CTA TTT CAG AAT ATA AAG AAC TGC CTC AAA ATA GAA GGC CAT 991 

tAVQPTFTEGSLFFLIHGEE 222 
TTA GCT GTG CAG CCT ACT TTC ACT GAA GGT TCA TTG TTT TTC CTG TTA CAT GGA GAA GAA 1051 

CAOK I RAIVOHHOAEVKEKV 242 
TGT GCC CAG AAA ATA AGA GCT TTA GTT GAT CAC CAT GAT GCA GAG GTG AAG GAA AAG GTT 1111 

V T I I P K I * 250 
GTA ACA ATA ATA CCC AAA ATC TGA 1135 

TTGGTCATATTTTTCCAAAGAGTAATGCAGTCTGGA TATAAATGTATTTTCTGTCTTCCTTATAAGCGGATTCTCCCAG 1214 

CTGCTAAATTTAAACAGTAAATATCACATTTTGTCATTAACACAGCTATAACTTCCCGTGGTTCTCAGATTTATTTTGG 1293 

ACTATTTTGATGCCAAGTGAATATAAGAGCTTGTACTGAAACCATTTATTTCTTTCTATTTTGCTATTTGCAAATGCTT 1372 

CTTArCTTCCCTACATGAAGTGGCAGTAACCTTTTTCACATTTAAGCTACCCTTCTACCTTTTGAAGTGATTTGCAGTT 1451 

ACTCATCTCAGACAGCArCAGTATTTGACTAAATCATTGTTTCACAACTGAATAGTCTTGTTCTTTTAGTAGCAATGAA 1530 

ArCCTAAGCTCTTGAGGCCATTCACCTGCCAACCTGACCATACTGCTTTCAAAAGTCTTTTCTCATCAGTAGAATCTAT 1609 

rTTGGTCACTTCTAGTCAATGAAAAATGTAAACTTTTACGAGAGAATGTTTCCTACGACTCACCCACTCCATTCAATGT 1688 

rACATATAAAATAGTGTGArCAATCACAATGTCCArCTTTAGACAGTTGCTTAAATAAATTATCTCGTCTTTGAAAAGA 1767 

CCGTGCrCGGCGCGGTGGCrcrTGCCTGTAArCCCAGCACTTTGGGAGGCTGAGGCGCGCAGATCACCTGAGArCGGGA 1846 

GTTTGACACCAAGCCTGACCAATATGGAGAAACCCrGTCTCTACTAACAATACAAAArTAGCTGGGCATGGTGGTGCAT 1925 
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GCCTGTAATCCCAGCTACTTGCGACGCCGAGGCAGGACAATTGCTTGAACCCGGCAGGCAGAGGTTGCAGTGACGTGAG 2004 

ATAGCGCCATTGCACTCCAGCCTGGGCAACAAGAGCAAAACTCTGTCTCAAAAAAAAAAAAAAATGATGGAGCTCCCAA 2083 

TGTGCTTAAGTGGAAAGATATCTATGAAATATGGTGGTTTTTTAAAACACAAAAATTATAGAATATGGGATCCCGTGTG 2162 

TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGAATGAAAAATGCTTATGTATTGACAGAACACTT 2241 

CTAGAATGATACCCAAACTCCTGGAGTGGGAGTGGGGAATGCCTTCTACGTACACACTGTTCTACTGTTTGAATTTTTT 2320 

AATATGAGCCCAAATTGTATAATCTTTTTTTAATAAAGGGGAGAAAAATCAAAAAAAAAAAAAAA 2385 
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Input file T181AtmXl81a; Output File T18lAtmXl81a.pat 
Sequence length 3919 

CGGGTCTCCCGGTTTCTACGGTTGCACGGGCGTTCGCCTCTCTACGGAGCGCCTGGAGCGACACCCTGCATACACGTTC 79 

MAQLGAVVAVASSFFCAS 18 
ACTG ATG GCT CAG TTG GGA GCT GTT GTG GCC GTG GCT TCC AGT TTC TTT TGT GCA TCT 137 

LFSAVWKIEEGHIGVYYRGG 38 
CTC TTC TCA GCT GTG CAC AAG ATA GAA GAG GGA CAT ATT GGA GTA TAT TAC AGA GGT GGT 197 

ALL TSTSGPGFHLMLPFITS 58 
GCC CTG CTG ACC TCC ACC AGT GGC CCG GGT TTC CAT CTC ATG CTC CCG TTC ATC ACA TCC 257 

YKSVQTT LQTOEVJCMVPCGT 78 
TAT AAG TCT GTA CAG ACC ACT CTC CAA ACT GAT GAA GTG AAG AAC GTA CCA TGT GGA ACC 317 

SGGVMI Y FOR IEVVNFLVPN 98 
AGT GGT GGT GTG ATG ATC TAC TTT GAC AGA ATT GAA GTG GTG AAC TTC CTG GTC CCA AAT 377 

AVYDIVKNYTADYDKALIFN 118 
GCA GTG TAT GAT ATA GTG AAG AAC TAT ACT GCA GAC TAT GAC AAG GCC CTC ATC TTC AAC 437 

KIHHELWQFCSVHTLOEVYI 138 
AAG ATC CAT CAT GAG CTT AAC CAG TTC TGC AGC GTT CAT ACT CTT CAG GAA GTC TAT ATC 497 

E L F 0 Q I DENLKLALQQDLTS 158 
GAG CTG TTT GAT CAA ATT GAT GAA AAC CTC AAG TTG GCT TTG CAG CAG GAC CTG ACT TCC 557 

M A P G I V I OAVRVTKPN I PEA 178 
ATG GCC CCT GGG CTG GTT ATC CAA GCT CTG CGA GTG ACA AAG CCC AAT ATA CCT GAG GCA 617 

I RRMYELMESEKTKLl I AAQ 198 
ATC CGC AGG AAC TAT GAG CTG ATG GAA AGC GAG AAG ACG AAG CTT CTC ATT GCA GCC CAG 677 

KOKVVEKEAETERKKAL I E A 218 
AAG CAG AAG GTG GTG GAA AAG GAG GCA GAA ACA GAG AGG AAG AAG GCC CTC ATT GAG GCA 737 

EKVAOVAE I TY.GQKVMEKET 238 
GAA AAA GTG GCA CAG GTT GCA GAA ATC ACC TAT GGG CAA AAG GTG ATG GAG AAG GAG ACA 797 

EKK( SE t EOAAFLAREKAKA 258 
GAG AAG AAG ATC TCA GAA ATT GAA GAT CCT GCG TTC CTG GCC CGG GAG AAG GCG AAG GCC 857 

OAECYTALKIAEANKLKITP 278 
GAC GCT GAG TGC TAC ACA GCC CTG AAG ATC GCA GAA GCA AAT AAG CTC AAG CTG ACT CCA 917 

EfLOLMJCYKAIASNSK I YFG 293 
GAA TAC CTG CAG CTG ATG AAG TAC AAG GCC ATT GCT TCC AAC AGC AAG ATT TAC TTC GGC 977 

<0 IPNMFMOSAGGIGKOFEG 318 
AAA GAC ATC CCC AAC ATG TTT ATG GAT TCC CCA GGG GGG CTG GGC AAG CAG TTT GAG GGG 1037 

LSOOKLGFGLEOEPLEAPTIC 338 
CTG AGC GAC GAC AAG CTG GGC TTT GGC CTA GAA CAT GAG CCC CTC GAG GCA CCC ACA AAG 1097 

EN * 341 
GAG AAC TGA 1106 

GGAAACACTGTCTGCAAGCTCTGCTCGGGCAGCTTAGAGACAGCTGTATTCTTTAAGATGAGACAGAGCAAAGCGCTCC 1185 

TCCTTTCCACACTACCTTCCTTGACTCTTCTTACTGTGGTTAAAAAGGAAGAAArGGACACAAACTTACCCCCTTCTGG 1264 

GAAGGGAGAGCAGATGGAGAGTTGTTTTTTGGGTTTATTTTTAATTCAGGTAAGTAAGTTGrATGACTTCTGAGAAGGT 1343 

GTATGCACCGTAGATTTGACCTCTGACCTGCAGACACCAACATTGTCACTTTGAAGCTGGTTTAAGTGGAGCTACTGTC K22 

AGTArGAAGAGCGAGAGTGTGTGCTGCCTCCTCGTGCTTGAATTCCTTCAGGGAAAAGTGTACTCCACAGTTCTCTCCC 1501 

TTGCCTCTAGTGTAGGCAGTGTCTGCGTGTGGGGCTCGTGACAGAAGGCCGTCTGCTCCGGAACArGAGCTGCAGAGAG 1580 

CGTTGGCCGGCTGGGCTTTTTGACTGAGTGGATTACTTGACACTTAACCTGTCTTGAGCCCTTTTTAGCAAGAACTTGG 1659 

TGCTAGGTTTTGCAAGGTTTTCTACACACTGTACTCTGCTCTAGTGTTTCTTGGCTACATCTCACCGCAGCAGGGCTTG 1738 
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GTCACACCACACACTCCTTTTCCCTACTTTGACCTGATCTCTGATTTCATTTCTTCTTGAATAATCTATTCATCAGTTC 1817 
CACTCAGCGTTAAGATGGGAACAAACAAGTGCTGTTAGCTGATGACGTAGCTCCTTATACCCCTTAGCACTGTGGTGCT 1896 
GTGTGGCTAATTATGCGTATGCTTTTGAGACCAAACATCTTTATCATTATGGAGATTCTTCATTGAAGAGCCCTTAACA 1975 
CTGTGGAGAAGGGCCCAGCCAGATGACACCCAAGTAGTAGTGCCTGTGGCCTGTGCTGGGGCTTTGTCTGACACTGATG 2054 
AAGAGAGCAGGCAGCCACTTGAGAGTCGGCTCCAGTGAGTCACCCTAGGAAACTGAGAATGCGAAGAATAGATATGAGA 2133 
GAAAGGGATTTCTTATCCTGAAATTGCACTGGGGGTGGGGCTCTACCATGGCCTGTGAGTGCACACAGAATGCCTCTGT 2212 
GGAGGGCAGCTCTGCAGGTAATCTGCAGACATGGCAGTACCCTGTGCAACCATGACTGGCTCTAGCTTAGGACTTGGCC 2291 
TTGTTAGCTGGTCCCCTACCTCATCCTCCCCCCACACAAAGCACCTACTGTTCTCTCTTAGGTGACTACTATAAATGGT 2370 
ATTTTCTGGCATCAATTCCCACCTCAGTTTTGGTTTTGTAAGTCGGGCCAGTTTGCTCCTAAGTGGCACCAGACTTGTC 2449 
AGGTATTTGGGAAGCATTCAGCCGACCCAAAAAGAGGCAGGGTTCACTGTGCTTACTTCACATGTTCCCTTCTCTGTCC 2528 
TGACTCCTCAGGCCACTGACCCTGGCCACACTGTACAAACTACAAAATGTTCCTGAAAAGGACATTTTAATGTGCTCAA 2607 
AAGCTCTTGCAAAAGTGGGTTTTTTTTCCCCAAGACCAACTCATCTTCTTCTCATTTGTTGCTGCTAACCACTTGTTGA 2686 
GAGCAACGTCCTATACCCAGCATCCTCTCTTGTACGTGCACCTGAGAAAACACTACTTCAGTGCAGTCGGTGCAGGAGG 2765 
GAGGGTACCCCGCCATCCAGCGCCCTCCTAGCCCGAGAGGCTCTGTAACTAGCATTCTGAGAGCTCATCCCTCCATTAC 2844 
AAAGAGCCACAGTAAAGTCCTGCTGCAGCTGCTCCTTCCCTGCCCCTTTAATGTCACTTCTTTAACAGAACAGAAATGT 2923 
CCCCATGTCATAGCATAAATTCAGTAGCTATTGGTATCTGTCCCACCAGTAAAATCATGGAACTCAGATGTCTTTTTAG 3002 
CATGGGATGCCTAGCCCArCTGTCTTTATGACCTTGTTTTTTGTAATACTATAAAArCTGACTTAGGCATTTGAATTCT 3081 
AAACATGTAAAATGTGATAAGCCTGCAGTTTTGTAGGCAGTGAATTCATAGCTGCTATTTTTAAGTAGAACTTCTATCA 3160 
AAATACGTTAACCGTTTGTAAAATTCAGTTTTTGTAGGACTTTCCCAAGGCCCAGCCACCTTGGTAGAATGCTTCTCAC 3239 
TCACTAAATGTTGCAGAAGCAATTTATATTCCATATAGGTTTTTAATCACTTTTCAATATATGGTTAGAATGTTTGTAA 3318 
GGAAGCCTAAGTTTAATAATTTTTATATAACTAAAAATAGGTGTGGAGGACTCAGTGTGGGTACTGAGGAGGAATGAAG 3397 
TGCTCTGAAAAGGGAGGTGTATAAACGGCCTGTGGGGCCGTGTGTCTTGTGAAAGTCAGATAGCCGTGCTTACTGACCT 3476 
G6GCTGTCGTCAGCTGGCCGTCGGTAAACTACCTGGACAATAGCCCCTCTGTCTGGGAACTTTACCTACTTCCTTGTCC 3555 
rCAGTGGGCTTCTAGCCACTGTTTGTTTCCTTATAAAAGCTGTAATGGGCAATCATGTGTTTGTACTTCCATTCCTTTT 3634 
TATCTCTACTTCTGTGTAAACTGGTGATTGAATAGTTAAAGCAATTTTTTCAGTGTGCCCCAAGGGCATTAATGAGCCT 3713 
rTATAACTGAGAAArGATTCTTGTTATAGTAATTATTCCArAAATGATACCACTAGATAAATTACCTTGGGTTAATAGC 3792 
rcCAGGATTTGTTTCAGACAACAAAAAAAGGTCTCAATGTGAATATACTTACArTTTGGATTTAATTTCAGTCTTGCTA 3871 
A A T AAAA rGTTTTTGTCTTTTTTTGAT T A AGG T A A A AAAAA AAA AA A A 3919 
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Input file T182mouse; Output File T182mouse.pat 
Sequence length 3087 

MNMTQARL 8 

GGAACCCCGCGTCCGCNGATGCGTCACTGACCCGAGGAACAAGG ATG AAT ATG ACT CAA GCC CGG CTT 68 

LVAAVVGLVAILIYAS IHKI 28 
CTG GTG GOT GCA GTG GTG GGG TTG GTG GCG ATC CTC CTG TAG GCC TCC ATC CAC AAG ATC 128 

EEGHIAVYYRGGALITSPSG 48 
GAA GAG GGA CAC TTG GCC GTG TAC TAC AGG GGA GGA GCT TTG CTA ACG AGC CCC AGT GGA 188 

PGYHIMLPFITTFRSVOTTL 68 
CCA GGC TAT CAT ATC ATG TTG CCT TTC ATT ACA ACA TTC AGA TCT GTG CAG ACA ACA CTA 248 

QTOHVKNVPCGTSGGVHIYI 88 
CAA ACG GAT GAA GTT AAA AAT GTG CCT TGT GGA ACA AGT GGT GGA GTC ATG ATC TAT ATT 308 

OR I EVVNMIAPYAVFD I V R N 108 
GAC CGA ATA GAA GTG GTT AAT ATG TTG GCT CCT TAT GCA GTG TTT GAC ATT GTG AGG AAC 368 

YTAOYOKTLIFNKI HHELNQ 128 
TAT ACT GCA GAC TAC GAC AAG ACT TTA ATC TTC AAT AAA ATC CAC CAT GAG CTG AAC CAG 428 

FCSAHTLOEVYIEIFOQIDE 148 
TTT TGC AGT GCC CAC ACA CTT CAA GAA GTT TAC ATA GAA TTG TTT GAT CAA ATA GAT GAA 488 

MLKOALOKDLNTMAPGLTIQ 168 
AAC CTG AAG CAG GCC CTG CAA AAA GAT TTA AAC ACC ATG GCC CCA GGT CTC ACT ATC CAG 548 

AVRVTKPKIPEA IRRNFElM 188 
GCT GTG CGT GTT ACA AAA CCC AAA ATC CCA GAA GCC ATA AGA AGA AAT TTT GAA TTA ATG 608 

EAEKTKLL IAAOKOKVVEKE 208 
GAG GCA GAG AAG ACA AAA CTT CTC ATA GCT GCA CAG AAA CAA AAG GTG GTG GAG AAA GAA 668 

AETERKRAVIEAEJCI A Q V A K 228 
CCT GAG ACG GAG AGG AAA AGG GCT GTT ATA GAA GCA GAG AAG ATT GCA CAA GTA GCA AAA 728 

I RFOOKVMEKETEKR I S £ 1 6 248 
ATT CGA TTT CAA CAG AAA GTG ATG GAG AAA GAA ACT GAA AAA CGC ATT TCT GAG ATT GAA 788 

OAAF LAREKAKADAEYYA.AH 268 
GAT GCT GCG TTC CTG GCC CGA GAG AAG GCA AAA GCA GAT GCC GAG TAT TAC GCT GCA CAC 848 

KYAT SNKHKLTPEYLEIKKY 288 
AAA TAC GCC ACC TCA AAC AAG CAC AAA CTG ACC CCA GAG TAT CTG GAG CTC AAG AAA TAC 908 

QAIASNSJCIYFGSNIPSMFV 308 
CAG GCC ATT GCC TCA AAC AGT AAG ATC TAC TTT GGC AGC AAC ATC CCC AGC ATG TTT GTG 968 

OSSCAIKYSDGRTGREOSIP 328 
GAC TCC TCC TGT GCT CTG AAA TAC TCT GAT GGT AGG ACT GGG AGA GAA CAC TCC CTT CCC 1028 

PEEAREPSGESPI ONKENAG 348 
CCA GAG GAG GCC CGT GAG CCC TCT GGA GAG AGC CCC ATC CAA AAC AAG GAG AAC GCA GGT 1088 

349 

TGA 1091 

TGCAAGAGGTGGAAATGTTCTCCCATATCAAGATGCGACCCAAGGGGCTAAGTGGGAACAGTGGTTATGTGGACTCGTA 1170 

AGATTCACAGAGAATGTGTGCTCTGTTGTGATTCTCTTGTCATAGTCCTGCTTTGCCAGCTGACTACAGGATAGACCCA 1249 

GCTGTCTGGCACTCAAACGGTCTCTGCAGCCACAGTTTTATCAAGTArCCTGTATGTGTTCCTTTGTAAACCGGTACTC 1328 

ATGAATGAGGGAAAGTCTCATGCTAAGATACTGCCTGCACTGGAATCTCAAACACTATATAACAAGCTGTGGTTTTTAA 1407 

AAGCTATTGAArAATGTTTACATTGGTCCCTGAGGACATGTGTGCTCAGACATTCAAGAGCTAGGAGGCCAGAGAGAAG K86 

ACCTTCAGAAAACGGTAAGTTAAAGAAGACAAGTGTCATCAGACACTTGGGACCCCGGCTCTCTTTAAAGTCTAGTCCC 1565 

GGCATTCCTCCArGTGATTGACAGCCAGACCTCTGGGTTCCCACGAAATTATCTTCCAGTTGAATGACCATTTACTTGA 1644 

rACAAATTGTACCTTTCTGTTTTTCTAGTCAGGTTGGTGGCCTGCAGGGACCCGTACTTTGCCACCCGACCACAGGTTC 1723 



F/6, 64 ^ i vvC; 



WO 00/1 8904 PCT/US99/2281 7 

108/112 

CTCGMGATATTCCCAATCACTACTTTATTGCGTTAGGAGACTCAGA6ATATACAAAGCACCTGAAATTTAAGGGAGAT 1802 
AAAGCCTGCACTGCACCAAAGCTACGGGTCCCTGTGTTTCCTCTATTCAGTGATGTCATCAACCTCACTGTCCCAGCCC 1861 
ATGTGTGACTAAAGTGCCCGGTTTTAGCCACAGACAACTGCTTAGATGTCACCTCTTGGCTGACCAAAGCTGGGACAGG 1960 
GCTTTAACCAGACATAGGAGCAGTGTGCAATTCCTGATTCACTGCACAGTATTATGTCATAATTGCAGGAATTATTTTT 2039 
TGTTTTTAAAACTGGArTTGGGGCACATTCATTCACCCCAACACTTCTATCTAAAGGCCAAGGTTCTAGGGCTGCTATG 2118 
GTCACTAACACACTGATTCTCCTTAAAGTAATTCTCGAAGTGTGGAACAAAGTGACCGAGACAGCATCCTCAGTCATCT 2197 
TTGTCTCCTTCCCTGGGATGCAGATACCGAAGTTGCTTTTCCAACTTTCGCCTCCGCTAGGAGATCAGAAAGAATTCTT 2276 
GTGACTTCCTGGGCAGCCATTGAATTCATTTTCCATGAGAAGATGACAGAGTTAGCCTGTGGCTATAGGAGATCATGTC 2355 
ATCCAGACCTTTTTGCCCATCACATTAACTTTCCTGGAATATTGTGCTGCACAGGTAGACCTGAATCTGCCCAGCTTGT 2434 
TGACAGCTCTTGTGTATACTGTGTTGAAGCCAGACAGAAAAGTAATGGGGCCACTTCTGAAACCTCTCAGCTGTTGATC 2513 
TCACAGCAGCTAAAGGGTTGTGCCAAACATTTTATTAAGAAAGTAAAGCCCAGATTTGAATGGGGGTTTTCCCTAGGCC 2592 
TTATAGTATAGAGGCATrTGTAATATGGAGAAAATAATTTTTCTCATTTAATTATAGAAATTACCTTCAAACAGATTTT 2671 
GTGTTCTTTGGCCCTTCAAATACTGGTGTTACATTGTTGCTGCAGATAAATGATGATTGTCGTGGGATATCTGGATCAC 2750 
TGAGCTCTGTGCTTTCATTCCTAGAGATGTTTCTCATTCCCATTTAGTGAAATGCTGTTGCCCCAAAGTGATGGTTGTG 2829 
GGATTTCTTACCGGTCATAGGCCCCGGTGAGGAGCAGGGAAGCGCCATTGTGAAAGATTAAAGAAACCACTTCCACTTG 2908 
AGCTCCTTATGGAGTGAGCTTCCCTGTGCCCACTCAGTGAACTAAGTCTGACCATCCTTCAGGGACGTTCCTTTTGGTA 2987 
AATATACACTGTAATCTTTAAGTCTAAArTTATATGTGAAAGTTAACTTTTTTTAAAAACCTAAATAAAATTATTTTCC 3066 
TAT CAAAAAAAAAAA A AAAAA 3087 
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Input file T187Aymue064g11; Output File Tl87AymueQ64g11.pat 
Sequence length 2883 

GTCCACGAAAAAGCTGCTTGCACTAGGCCCATCCCCCCTGCCTGGTGAAAGGAACCCCAGCACACACGCTGCGAGGGCT 79 

TCCCATTTTAGCAGGGCGGCTTCCGGAAGGCGGAGCTCCAACCCCATTTCCTTTCrCTGGGCTGGTTCTGGCCCAGCTC 158 

M G G A R 0 6 
CACCTGCGTGTGGCCCTGGCTCCTCGGCTCCCTGCAGCTCCGAGGCAGCAGC ATG GGT GGC GCG CGG GAC 228 

VGWVAAGIVLGAGACYCIYR 26 
GTG GGC TGG GTG GCA GCA GGG CTG GTC CTG GGC GCC GGC GCC TGC TAC TGT ATC TAC CGG 288 

ITRGPRRGGRRIRPSRSAED 46 
CTG ACT CGG GGA CCG CGG CGA GGC GGT CGC CGA CTG CGC CCT TCG CGA TCC GCA GAA GAC 348 

LTOGSYOO I LNAEQLKKLLY 66 
CTA ACC GAT GGC TCC TAT GAC GAT ATC TTA AAT GCA GAG CAG CTT AAG AAA CTT CTG TAT 408 

LLESTDDPVITEKAIVTLGN 86 
CTG CTG GAG TCA ACC GAC GAT CCT GTC ATT ACT GAA AAG GCC TTG GTC ACC TTG GGA AAT 468 

NAAFSTNOA f IRE LGG IP IV 106 
AAT GCA GCC TTC TCC ACT AAC CAG GCC ATT ATT CGT GAG TTG GGT GGT ATC CCA ATT GTT 528 

GNKI NSLNQSIKEKALNALN 126 
GGA AAC AAA ATC AAC TCC CTG AAC CAA AGT ATT AAA GAG AAA GCT TTA AAT GCA CTG AAT 588 

MlSVMVEMOTKlKIYVPOVC 146 
AAC CTG AGT GTG AAT GTT GAA AAT CAA ACT AAG ATA AAG ATA TAC GTC CCT CAA GTC TGT 648 

EDVFADPINSAVOLAGIRLL 166 
GAG GAC GTC TTT GCT GAC CCC CTG AAC TCT GCG GTG CAG CTG GCC GGA CTG AGG CTG CTG 70S 

TMMTVTMDYOHLLSGSVAGL 186 
ACA AAC ATG ACG GTC ACC AAC GAC TAT CAG CAC CTG CTC ACC GGC TCC GTC GCT GGC CTG 768 

FHLLILGNGSTKVQVIKLLL 206 
TTC CAC CTG CTG CTG CTG GGA AAC GGA AGC ACC AAC GTC CAG GTT TTG AAG CTG CTT TTG 828 

MLSENSAHTEGLLSVQVSRL 226 
AAT TTG TCT GAC AAT TCA CCC ATG ACA GAA GGA CTA CTG AGT GTC CAA GTA AGT AGA TTA 888 

PTRFISAHIQRF* 239 
CCT ACC CGG TTC ATT AGT GCA CAC ATA CAG AGA TTT TGA 927 

CAAATAGATCTGCAAAGGTATGCCCAAAAACATTCACAGGAATTATTTCTGAAGATGAGTATTAAGCATATTTTGTTTT 1006 

TTAAAACTTCTCTGTGGCACCAGCAGACTTTCCATCTCTGCCCACTTTGCAGTATTTTTCTGTCACTGCATTTTAAAGT 1085 

TTGTTTTTTTTGTGCATGTGTACCTCAGCATTTGCTGAAACAACTGTACTGAGTGAGTCCCCTGTGTGGGCTCGGTCCT 1164 

CAGCATTCAGCCAGCACCACCAAGTTCTTAGTGTTCCCATGGAACTTAGGAGAAGCAACCATGTAACAAArTAGCAAGA 1243 



CTGTTGAAAACArGTAACAAACCATTGAAACAGTCCCTGTGCTCTGAAGAACGCCAGGCCGTGTGAGCCGTCTGCAGAA 1322 
ATCGAGCCATCTGCTCCGTCCTGTTACCAGAACTGTGTGTAAGAGCTAATGCTGATTGAACTAATGTGTTCTTACAAAA 1401 
ACTGGATAGATCCTAAAGGGGTTGGTTTCCCAAATGGCTACACTCTGGAGTTCCAAAGAAATCTTAGTTTTTCCCCTAA H80 
CAAAACGTCATTTTCACTTGTAACArCGAArAAAAATGAAACATGTCCCTTACGCTTGCCTGGAGTCAGACTTTTACAG 1S59 
rCTTAACTAArCCArGCTGTTTTAAAArAGGACAGrGACGCTGTTTCCTCTTTCAGGTGCATTCTTCArTCCTTTCCCT 1638 
rrATGACGGCCAAGTAGCAAATGAGATTCTTCTTCGGGCTCTTACACTGTTTCACAATArAAACAACTGCCTCAAAGTG 1717 
GAAGGCCGGTTAGCrAArCAGATTCCTTTTGCTAAAGGGTCArTGTTTTTTCTGTTATACGGAGAAGAArGrGCCCAGA 1776 
AAArGAGAGCTTTAGCCTGTCATCATGATGTGGATGTGAAAGAGAAAGCTTTAGCAATAAAGCCGAAATTCfGATCGGT 1875 
rGCTCCTATTTTTArCAAACACTCAAACAGTAAGGCAGTCTTAAGTCACCACACGGGAGCGTTTGCCTGCCTTTAAAAG 1954 
CCGTCTTTCAGCCGArGGAGTTAAACAArAAAAGTGAGTGAGCAGCTCTAATCCAACACGATGTTCAAAATTTTAGArT 2033 
rTGGAGTAGTTCAGATTTCGGGTTTGGGGArTGAGTAGAGTCTGGAACCTTCCGAGGATGTGGATCATTTACGCGGCAA 2112 
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ACGTTTGGTTATGATCGTCGACACACTGGCCATCCTCTTCAGCACTATTTGAACGATTCTAGTCCTAGTGAATGAATAT 2191 
GAGGGGCTGTACTGAAGATACTTGCTGAGGTATTTAATGGTTTCCTGACACGAACTGAGTGGCCTGTCTCTGTACAATC 2270 
CTAACTCCTGGGAGCATTTGCAGTTGCTCATGAGACAGCGTTAAGTGCTGAGTTGAAGTCTGTTACTGCCACAGCAAGG 2349 
ACCTTGTGCCTCAAACCAGTGAATACTGCAAGCTCGAGTCCACCACCAACCCTGCCATGCTGCTTCCAAGTCTGAGCTC 2428 
ATCGTGAGACACTGCCTGCAGCArTTCTGATCAGTAGGACTGTACTCCCATTTACATGGAAAGCGTTTTCTTACTGCTT 2507 
ACCCCCTTGTGTAA GA TACT GCAGAGCACTCCAAGCTTCCACCCACAGGCAGACAGCCCTTTAAAAAAGAGTGTCCTGA 2536 
TAAGTCCAGATGGATACATGGAGAAACATACCCATGAGATGGCTGCTTTGAAAGCATGCTGGGAAGCAATGTATTAGGG 2665 
TCCCGTGTCTTTTTTTTCTCTCAGTAATGATAAATACACTTATACATGGACAGAACATTTCTAGAACGATTCAGAAAAC 27U 
TTCTGGGACTGGGACTAGGGTACATAGATTTCTTTGTGTTCCTGTTTCTACCGTTTGGATTTGTACTGAGCATAAATTG 2823 
TATAATTTTTTAATAAAAAGGAAAAATGCAAGGTGTACATAAAAAAAAAAAAAAAAAAAA 2883 
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Input file T215AtmX2l5; Output File T215AtmX215.pat 
Sequence length 2744 

HELORWAOLGLV 12 
CTCGGTACCGACACAGCAACGGGAAACG ATG GAG CTA GAC AGA TGG GCG CAG TTG GGG CTG GTG 64 

FLOLLLISSIPREYTVINEA 32 
TTC CTG CAG CTC CTT CTC ATC TCA TCG TTG CCA AGA GAG TAC ACG GTC ATT AAT GAA GCC 124 

CPGAEWN IMCRECCEYDQIE 52 
TGT CCC GGA GCT GAG TGG AAC ATC ATG TGT AGA GAA TGT TGT GAA TAT GAT CAG ATT GAA 184 

CLCPGKJCEVVGYT I P C C R N E 72 
TGC CTC TGC CCA GGA AAG AAG GAA GTG GTG GGT TAC ACC ATC CCA TGC TGC AGG AAT GAG 244 

DNECDSCIIHPGCTIFENCK 92 
GAT AAT GAA TGT GAC TCC TGT CTA ATT CAC CCA GGT TGT ACC ATC TTT GAA AAC TGC AAG 304 

SCRNG SUGGTLOO FYVJCGFY 112 
AGC TGC CGC AAT GGC TCC TGG GGC GGA ACT CTG GAT GAC TTC TAC GTG AAG GGA TTC TAC 364 

CAECRAGWYGGDCMRCGQVL 132 
TGC GCA GAG TGC AGG GCA GGC TGG TAC GGA GGA GAC TGC ATG CGA TGT GGC CAG GTT CTT 424 

RASKGQ I llESYPLNAHCEW 152 
CGA GCC TCA AAG GGT CAG ATC TTG TTG GAG AGC TAT CCC TTA AAC GCT CAC TGT GAA TGG 484 

T I HARPGFI IQLRPGMLSLE 172 
ACT ATT CAT GCC AGA CCT GGG TTT ATC ATC CAG TTG AGG TTT GGT ATG CTG AGC CTA GAG 544 

FOYMCOYOYVEVROGOMSOS 192 
TTT GAC TAC ATG TGC CAA TAT GAC TAT GTG GAG GTC CGC GAT GGG GAT AAT AGT GAC AGC 604 

PI IKRFCGNERPAPIRSTGS 212 
CCT ATC ATC AAG CGT TTC TGT GGC AAC GAG AGG CCA GCT CCC ATC AGG AGC ACT GGC TCT 664 

SLHVLFHSDGSKNFDGFHAV 232 
TCA CTC CAT GTC CTT TTC CAT TCT GAT GGC TCC AAG AAC TTC GAT GGC TTC CAC GCT GTC 724 

FEE I TACSSSPCFHOGTCLL 252 
TTT GAG GAG ATC ACA GCG TGC TCC TCA TCC CCT TGT TTC CAT GAT CGC ACA TCC CTC CTT 784 

DTTGSFKCACIAGYTGQRCE 272 
GAC ACC ACT GGG TCT TTC AAG TGT GCC TGC CTG GCT GGC TAC ACT GGG CAG CGC TGT GAA 844 

NL LEERNCSDLGGPVNGYKK 292 
AAT CTA CTT GAA GAA AGA AAC TGC TCA GAC CTT GGG GGG CCA GTC AAT GGG TAC AAG AAA 904 

I TEGPGllNERHVJCIGTVVS 312 
ATC ACA GAA CGT CCT GGA CTT CTC AAT GAG CGC CAT GTA AAA ATT GGC ACG GTT GTG TCT 964 

FFCNGSYVLSG'NEKRTCQQN 332 
TTC TTT TGT AAC GGC TCA TAC GTT CTG AGT GGC AAT GAG AAA CGA ACT TGC CAG CAG AAT 1024 

GEUSGKQPVCMKACREPKIS 352 
GGA GAG TGG TCA GGA AAG CAA CCT GTC TGC ATG AAA GCC TCC CGG GAA CCG AAG ATC TCA 1084 

DLVRRRVLSMOVQSRETPLH 372 
GAC CTG GTG AGA AGG AGA GTC CTT TCG ATG CAG GTT CAG TCA AGG GAG ACA CCA TTA CAT 1144 

OtYSTAFSICOIClQDASTJCKP 392 
CAG CTT TAT TCC ACG GCT TTC AGC AAG CAG AAA TTG CAG GAT GCC TCT ACC AAA AAG CCA 1204 

ALPFGOLPPGYOHLHTQVOY 412 
GCC CTT CCA TTT GGA GAC CTG CCC CCT GGA TAC CAA CAT CTG CAC ACC CAA GTC CAG TAT 1264 

ECI SPFYRRLGSSRRTCIRT 432 
GAG TGC ATC TCG CCC TTC TAC CGC CGC CTG GGA AGC AGC ACG AGG ACA TGC CTG AGA ACT 1324 

GKWSGRAPSCIPICGICIEST 452 
GGG AAG TGG AGT GGG CGG GCC CCG TCC TGT ATC CCA ATC TGT GGA AAA ATC GAG AGC ACT 1384 

PSPKTOG TRWPUOAAI YRRT 472 
CCT TCT CCA AAG ACC CAA GGC ACC CGC TGG CCA TGG CAG CCA GCC ATC TAC CGG AGG ACC 1444 
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SGVH0CGLHKGAWFLVCS6A 492 
ACT GGT CTA CAC GAT GGT GGT CTG CAC AAA GGT GCA TCG TTC TTG GTC TGC ACT GGT CCC 1504 

LVNSATVVVAAHCVTEIGICA 512 
CTG GTG AAT GAA CGG ACT GTG GTT GTG GCT GCC CAC TGT GTG ACT GAG CTG GGG AAG GCC 1564 

TitKTADlKVVlGKFYRDDO 532 

ACC ATC ATC AAG ACA GCA GAC CTC AAG GTT GTC TTG GGA AAA TTC TAC AGG GAC GAT GAT 1624 

RDEKSIQNLRVSAI I LHPNY 552 

CGG GAT GAG AAG AGC ATC CAG AAT TTA CGG GTT TCT GCT ATC ATT CTG CAC CCC AAC TAT 1684 

DP I LLOTD IAVLKILDKARI 572 

GAC CCT ATC CTG CTT GAC ACT GAC ATC GCT GTT CTG AAG CTC CTA GAC AAA GCT CCC ATC 1744 

STRVOPI CLATTRDLSTSFQ 592 

AGT ACC CGT GTC CAA CCC ATC TGC CTG GCT ACC ACT CGG GAC CTC AGC ACC TCT TTC CAG 1804 

E S H I TVAGWM I LADVRSPGF 612 

GAA TCC CAC ATC ACT GTG GCT GGC TGG AAC ATC CTG GCA GAT GTG AGG AGC CCT GGC TTT 1864 

KNOTIHVGMVRVVDPMICEE 632 

AAG AAT GAT ACC TTA CAT TAT GGA ATG GTC AGA GTG GTA GAC CCA ATG CTT TGT GAG GAA 1924 

Q H £ D H G I PVSVTDNMFCASJC 652 

CAC CAT GAA CAC CAT GGC ATT CCA GTT AGT GTC ACT GAC AAC ATG TTC TGT GCC AGC AAA 1984 

OPSTPSDICTAETGGIAALS 672 

GAT CCC AGT ACC CCT TCT GAC ATC TGC ACT GCA GAG ACA GGG GGC ATC GCT GCT TTG TCC 2044 

FPGRASPEPRWHLVGLVSWS 692 

TTC CCA GGC CGA GCA TCC CCC GAG CCA CGC TGG CAT TTG GTG GGG CTG GTC ACC TGG AGC 2104 

YDKTCSNGLSTAFTKVLPFK 712 

TAT GAC AAG ACA TGT AGC AAT GGC CTA TCC ACA GCC TTC ACA AAG GTG TTG CCG TTC AAA 2164 



OWIERMMK* 
GAC TGG ATT GAC AGA AAC ATG AAA TGA 



721 
2191 



ACCAGCCACAAGGCCACTGAGAAGCCTTTTCCTAGCATCCGTCTGTACATATGTTGTATA6AACAATGCGGGCCTGAAG 2270 

TGTAATTTTGCCCACCATCTTGGCTACTGAAAGGCTCCTCGTTTCAGGGACTTATCTCAATAGAGGGTGAACAGAGTTT 2349 

ACTTCATCACGGAACTGTCTCCCTGACTGCTTGGGAATCATCTAAAAGATGCCAGGTCTTGCAACAACTGGATTTCTTC 2428 

AAAGAAGACCATGTGACTAGAAGGAGAACCTCTTCCTCCTGCTCCACTCAGAGTGATGTGACTGTCAATCAGTTTGGGT 2507 

TGAGAAGGTTGATTTGGGGAGGCCTGGGCTGCACCTGGCTTCTGTCAAAGTTCCAAAGAACAAACAACTTAGACTAGCC 2586 

CAGGGCAAAGCAGATTGGGTGTGGCACCCTGTGTAAATTGTCACAAGATTGTCTGATCCTTTCCCTTTCCAATCTTCTC 2665 

T A CACA T T T CAA T A A A ACAA GGTCTGCTCCCTGACC T A CCAAA C AA A A A AAAA A A A A AA AA AAAAAAAAAAAAAAAAA A 2744 
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